[00:27:47] <icinga-wm>	 PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:55:46] <icinga-wm>	 RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[01:34:56] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 1817.086555 Seconds
[01:34:56] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 1817.833235 Seconds
[01:35:56] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 20.460992 Seconds
[01:35:56] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 21.254631 Seconds
[01:43:46] <icinga-wm>	 PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:58:26] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:59:36] <grrrit-wm>	 (03PS3) 10BBlack: nginx (1.11.4-1+wmf14) jessie-wikimedia; urgency=medium [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/319776 
[01:59:38] <grrrit-wm>	 (03PS2) 10BBlack: add stapling-multi-file patch [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320115 
[02:11:46] <icinga-wm>	 RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[02:16:45] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.1) (duration: 05m 39s)
[02:16:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:21:02] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Nov  7 02:21:02 UTC 2016 (duration 4m 18s)
[02:21:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:26:26] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[02:41:42] <grrrit-wm>	 (03PS4) 10BBlack: nginx (1.11.4-1+wmf14) jessie-wikimedia; urgency=medium [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/319776 
[02:41:44] <grrrit-wm>	 (03PS1) 10BBlack: remove debian perl ldflags patch [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320161 
[02:41:46] <grrrit-wm>	 (03PS1) 10BBlack: depend on lsb-base >= 3.0-6 [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320162 
[02:53:56] <icinga-wm>	 PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:56:56] <icinga-wm>	 PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:22:56] <icinga-wm>	 RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[03:24:56] <icinga-wm>	 RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[03:27:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 696.88 seconds
[03:32:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 299.73 seconds
[03:59:46] <icinga-wm>	 PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:00:46] <icinga-wm>	 RECOVERY - Last backup of the maps filesystem on labstore1001 is OK: OK - Last run for unit replicate-maps was successful
[04:01:56] <icinga-wm>	 PROBLEM - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 288 bytes in 0.010 second response time
[04:04:46] <icinga-wm>	 PROBLEM - Last backup of the maps filesystem on labstore1001 is CRITICAL: CRITICAL - Last run result for unit replicate-maps was exit-code
[04:26:46] <icinga-wm>	 RECOVERY - puppet last run on mc1022 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[04:38:43] <arseny92>	 [05:45] * fishkin (~fishkin@wikipedia/Gamliel-Fishkin) has joined #wikimedia-tech
[04:38:43] <arseny92>	 [05:47] <fishkin> Hi, I want to report a bug. Is someone here to read my message?
[04:38:43] <arseny92>	 [05:50] <fishkin> The MediaWiki default bot, which updates translated pages from the Translatewiki, does not update ate least 3 pages: MediaWiki:Cite section preview references/eo, MediaWiki:Cite warning/eo, MediaWiki:Cite warning sectionpreview no text/eo.
[04:39:16] <arseny92>	 scap-i18n update related i guess
[04:53:39] <greg-g>	 arseny92: maybe, a task would be good (#scap and #i18n)
[04:57:46] <icinga-wm>	 PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:21:46] <icinga-wm>	 PROBLEM - puppet last run on seaborgium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:21:52] <arseny92>	 greg-g T150155
[05:21:53] <stashbot>	 T150155: Localization updates do not update some eo messages - https://phabricator.wikimedia.org/T150155
[05:23:56] <icinga-wm>	 PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:23:57] <greg-g>	 arseny92: ty
[05:25:46] <icinga-wm>	 RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[05:48:46] <icinga-wm>	 RECOVERY - puppet last run on seaborgium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[05:51:56] <icinga-wm>	 RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[06:04:36] <icinga-wm>	 PROBLEM - Juniper alarms on mr1-eqiad is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 208.80.154.199
[06:05:36] <icinga-wm>	 RECOVERY - Juniper alarms on mr1-eqiad is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms
[06:29:36] <icinga-wm>	 PROBLEM - Disk space on logstash1003 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=96%)
[06:30:26] <icinga-wm>	 PROBLEM - Disk space on logstash1001 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=96%)
[06:31:16] <icinga-wm>	 PROBLEM - Disk space on logstash1002 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=96%)
[06:40:46] <icinga-wm>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Not Available - 531 bytes in 0.026 second response time
[06:51:46] <icinga-wm>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.056 second response time
[06:59:50] <_joe_>	 ugh logstash
[07:02:26] <icinga-wm>	 RECOVERY - Disk space on logstash1001 is OK: DISK OK
[07:02:36] <_joe_>	 !log removing old logfiles on logstash hosts
[07:02:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:04:16] <icinga-wm>	 PROBLEM - puppet last run on db1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:05:16] <icinga-wm>	 RECOVERY - Disk space on logstash1002 is OK: DISK OK
[07:05:36] <icinga-wm>	 RECOVERY - Disk space on logstash1003 is OK: DISK OK
[07:21:16] <icinga-wm>	 RECOVERY - puppet last run on db1030 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[07:34:37] <grrrit-wm>	 (03PS2) 10Muehlenhoff: carbon_pickled: Restrict to production networks [puppet] - 10https://gerrit.wikimedia.org/r/319878 
[07:41:46] <grrrit-wm>	 (03PS1) 10Marostegui: db-codfw.php: Depool db2042 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320166 (https://phabricator.wikimedia.org/T149553) 
[07:41:56] <icinga-wm>	 RECOVERY - Host labstore2001 is UP: PING OK - Packet loss = 0%, RTA = 36.17 ms
[07:52:54] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032] Depend on new ABI name [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/319870 (owner: 10Muehlenhoff)
[07:53:56] <icinga-wm>	 RECOVERY - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.220 second response time
[08:03:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:03:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:03:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:03:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:06] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:06] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:06] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:06] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:06] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:07] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:16] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:16] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:17] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:17] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:18] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore2001 is CRITICAL: CRITICAL slave_sql_state could not connect
[08:04:26] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:04:26] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore2001 is CRITICAL: CRITICAL slave_io_state could not connect
[08:05:40] <marostegui>	 :(
[08:05:45] <marostegui>	 Looks like it went out from downtime
[08:06:38] <_joe_>	 ueaj
[08:06:43] <_joe_>	 *yeah
[08:06:49] <marostegui>	 I gave it a month of downtime while we test
[08:08:04] <Revent>	 _joe_: Heya…
[08:08:51] <Revent>	 Whatever ‘load balancing’ algorithm the video scalers use, it apparently sucks. 
[08:09:02] <_joe_>	 Revent: there is no load balancing there 
[08:09:08] <_joe_>	 it's a job processing system
[08:09:42] <Revent>	 Yeah, what I mean is…. sometimes it will crap all the tasks on one machine, and put it at like 75-80% load, while the other sits idle.
[08:10:41] <_joe_>	 doesn't seem like that's the case tbh looking at the graphs from last week
[08:11:01] <_joe_>	 but yeah, that can in theory happen under very particular circumstances
[08:11:01] <Revent>	 Re-transcoding these big broken files, even at 720p, sometimes fails even when the machines don’t go over 50% or so.
[08:11:47] <_joe_>	 that's because transcoding is mostly a single-cpu job
[08:11:54] <_joe_>	 and those machines have multiple cpus
[08:12:10] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Bump changelog [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/320167 
[08:12:52] <Revent>	 Right, I know, it’s just that trying to fix these ‘big’ transcodes, it seems rather sensitive to even fairly low levels of load
[08:13:00] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Bump changelog [debs/linux-meta] - 10https://gerrit.wikimedia.org/r/320167 (owner: 10Muehlenhoff)
[08:13:25] <Revent>	 When it’s dumped a lot on one and not the other, it’s not really apparent in the longer graphs because the ‘spikes’ aren’t that long.
[08:20:52] <Revent>	 _joe_: Lets put it this way… yeah, it’s a single cpu job, and there are ~30 cpus, that would imply you could transcode a ‘lot’ of files at once, but if you throw even 5x of these big transcodes at 720p on at once, they will all error out. I can’t diagnose it, ofc, but trying to get these fixed seems quite touchy.
[08:21:20] <_joe_>	 we have a total of 8 cpus
[08:21:25] <_joe_>	 which is not much in fact
[08:21:35] <_joe_>	 did you open a bug by any chance?
[08:21:46] <Revent>	 Umm… lol, ganglia lies then.
[08:21:57] <_joe_>	 sorry, 16
[08:22:08] <_joe_>	 the 32 cores you see there are from HT
[08:22:13] <_joe_>	 (hyperthreading)
[08:22:14] <Revent>	 Ah.
[08:22:37] <_joe_>	 but yeah, throwing in a couple more machines could be a good idea
[08:22:51] <_joe_>	 actually, this is the classical case where an elastic environment would be optimal
[08:23:31] <Revent>	 TBH, it’s not the ‘big’ transcodes I care about, so much, but the thousands of little ones… they just aren’t in the short list that timedmediahandler makes visible.
[08:24:19] <arseny92>	 in IS at ln14076 , 'wmgRSSUrlWhitelist' => ['mediawikiwiki' => has 'https://git.wikimedia.org/feed/mediawiki/extensions/Translate.git', . I guess that should be changed to the phab clone url as per T139089? The redirect rule goes to nowhere tbh but to a list of all repos instead
[08:24:19] <stashbot>	 T139089: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089
[08:24:59] <Revent>	 But knowing there are ‘half’ the cores that it says rather makes the level of jobs where it starts erroring them out make a lot more sense. :P
[08:25:43] <_joe_>	 Revent: the OS will see the number of cores reported by ganglia, but that's mostly an artifact
[08:25:44] <arseny92>	 Howeve that var wants a rss it seems and I can't seem to find repo rss
[08:26:35] <Revent>	 Yeah, makes sense… also makes sense that hyperthreading won’t help with transcoding much if at all.
[08:28:32] <Revent>	 _joe_: FYI, I’ve not opened a bug because, tbh, I can’t really describe the ‘problem’ well at all, other than the transcoding system is twitchy and fails a lot on large videos. :P
[08:29:10] <_joe_>	 Revent: and, still very honestly, I would love to work on it but I have zero time for it
[08:29:18] <_joe_>	 I might open a ticket about it, though
[08:29:29] <Revent>	 Yeah, I understand that there are other more urgent things.
[08:30:10] <_joe_>	 it's more that we're very very thin on personnel :(
[08:30:33] <p858snake|L2>	 brion is already looking it afaik, he needs to fix several issues first (namely the issue where JobRubber thinks a file is done when its not causing it to load another job with overloads the system)
[08:30:35] <marostegui>	 !log Deploy schema change on s4 master (db2019) commonswiki.revision - T147305
[08:30:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:42] <stashbot>	 T147305: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305
[08:30:59] <_joe_>	 p858snake|L2: yes that's the main issue that needs solving on the dev side
[08:32:07] <Revent>	 p858snake|L2: A specific thing I have noticed is that there are broken transcodes in the ‘timedmediahandler’ list that are not shown as broken on the file page, but are…
[08:32:26] <p858snake|L2>	 have you opened a task about that issue?
[08:33:08] <Revent>	 I’ve mainly just been trying to ‘fix’ the transcodes.
[08:33:49] <Revent>	 But, sometimes in the middle of a list of transcodes that took hours, there will be one that supposedly was ‘successful’ in like 5 minutes.
[08:33:58] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 031] "Ok on codfw only." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320166 (https://phabricator.wikimedia.org/T149553) (owner: 10Marostegui)
[08:34:26] <Revent>	 I guess I can open a ticket for (and leave broken) the next ones I notice.
[08:36:33] <Revent>	 p858snake|L2: https://commons.wikimedia.org/wiki/File:20160914_Meeting_of_the_Presidents_Export_Council_HD.webm <- the 480P OGG transcode
[08:37:08] <grrrit-wm>	 (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2042 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320166 (https://phabricator.wikimedia.org/T149553) (owner: 10Marostegui)
[08:37:40] <grrrit-wm>	 (03Merged) 10jenkins-bot: db-codfw.php: Depool db2042 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320166 (https://phabricator.wikimedia.org/T149553) (owner: 10Marostegui)
[08:38:01] <p858snake|L2>	 Revent: yes, leaving stuff broken so people can look into the causes is generally a good idea, our crystal balls aren't always the best to try find causes after things have been repaired
[08:38:54] <Revent>	 (lol)
[08:39:43] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2042 for maintenance - T149553 (duration: 00m 50s)
[08:39:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:49] <stashbot>	 T149553: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553
[08:42:16] <wikibugs>	 06Operations, 10Wikimedia-General-or-Unknown: Extend capacity for video scalers - https://phabricator.wikimedia.org/T150067#2775570 (10MoritzMuehlenhoff)
[08:43:22] <wikibugs>	 06Operations, 10Wikimedia-General-or-Unknown: Extend capacity for video scalers - https://phabricator.wikimedia.org/T150067#2773188 (10MoritzMuehlenhoff) Instead of repurposing the codfw scaler (we actually have only a single one) we should rather expand the capacity in codfw. Also, both video scalers in eqiad...
[08:44:43] <marostegui>	 !log stopping mysql on db2042 - maintenance- T149553
[08:44:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:19] <Revent>	 p858snake|L2: https://phabricator.wikimedia.org/T150158 <- probably not a great description.
[08:46:28] <moritzm>	 !log uploaded linux-meta 1.11 to carbon (pointing to the new Linux ABI package)
[08:46:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:05:15] <grrrit-wm>	 (03PS2) 10Gilles: Set environment variables for ImageMagick running inside Thumbor [puppet] - 10https://gerrit.wikimedia.org/r/319807 (https://phabricator.wikimedia.org/T149985) 
[09:10:18] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2775602 (10Gilles)
[09:10:21] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Ask firejail upstream about ability to turn off pid namespacing - https://phabricator.wikimedia.org/T149981#2775600 (10Gilles) 05Open>03Resolved > You cannot turn off PID namespace, it is hardcoded deep inside the program. If you do "ps aux" outside of sandbox...
[09:26:41] <wikibugs>	 06Operations, 10ops-eqiad, 10DBA: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2760850 (10jcrespo) p:05Triage>03Normal
[09:29:36] <grrrit-wm>	 (03PS2) 10Ema: site: apply role::systemtap::devserver to copper [puppet] - 10https://gerrit.wikimedia.org/r/319616 
[09:29:44] <grrrit-wm>	 (03CR) 10Ema: [C: 032 V: 032] site: apply role::systemtap::devserver to copper [puppet] - 10https://gerrit.wikimedia.org/r/319616 (owner: 10Ema)
[09:34:51] <_joe_>	 ema: please include that role inside role::builder
[09:35:18] <_joe_>	 I can do it for you, but I am trying to reduce the number of places where hiera calls can end up
[09:36:44] <ema>	 _joe_: sounds good, I'll add you to the reviewers :)
[09:37:12] <grrrit-wm>	 (03PS1) 10Gehel: elasticsearch - /etc/elasticsearch/scripts required for elasticsearch start up [puppet] - 10https://gerrit.wikimedia.org/r/320168 
[09:45:30] <grrrit-wm>	 (03PS1) 10Ema: Add role::systemtap::devserver to role::builder [puppet] - 10https://gerrit.wikimedia.org/r/320169 
[10:03:59] <wikibugs>	 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2775679 (10Amire80) p:05Triage>03Normal
[10:04:26] <wikibugs>	 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2748980 (10Amire80) p:05Normal>03Triage
[10:07:31] <jynus>	 !log performing schema change on s5 (imagelinks) T139090
[10:07:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:07:38] <stashbot>	 T139090: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090
[10:09:39] <wikibugs>	 06Operations: Remote IPMI doens't work for ~17% of the fleet - https://phabricator.wikimedia.org/T150160#2775695 (10Volans)
[10:19:28] <moritzm>	 !log rebooting bast4001 for kernel update
[10:19:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:25:47] <grrrit-wm>	 (03PS2) 10Ema: Add role::systemtap::devserver to role::builder [puppet] - 10https://gerrit.wikimedia.org/r/320169 
[10:26:52] <moritzm>	 !log rebooting cp1008 for kernel update
[10:26:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:28:43] <grrrit-wm>	 (03CR) 10Gehel: "puppet compiler: https://puppet-compiler.wmflabs.org/4552/" [puppet] - 10https://gerrit.wikimedia.org/r/320168 (owner: 10Gehel)
[10:28:47] <grrrit-wm>	 (03PS2) 10Gehel: elasticsearch - /etc/elasticsearch/scripts required for elasticsearch start up [puppet] - 10https://gerrit.wikimedia.org/r/320168 
[10:30:30] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: docker::registry: allow fetching images from the internet [puppet] - 10https://gerrit.wikimedia.org/r/320172 
[10:31:17] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 031] Add role::systemtap::devserver to role::builder [puppet] - 10https://gerrit.wikimedia.org/r/320169 (owner: 10Ema)
[10:31:42] <_joe_>	 ema: a sanity check on my change would be appreciated :)
[10:32:42] <grrrit-wm>	 (03CR) 10Ema: [C: 032 V: 032] Add role::systemtap::devserver to role::builder [puppet] - 10https://gerrit.wikimedia.org/r/320169 (owner: 10Ema)
[10:33:52] <ema>	 _joe_: looking
[10:35:14] <grrrit-wm>	 (03PS2) 10Gehel: Maps - tilerator on all maps servers needs access to postgresql master [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) 
[10:39:22] <_joe_>	 is zuul down again?
[10:39:51] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: docker::registry: allow fetching images from the internet [puppet] - 10https://gerrit.wikimedia.org/r/320172 
[10:39:53] <grrrit-wm>	 (03CR) 10Ema: [C: 04-1] docker::registry: allow fetching images from the internet (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320172 (owner: 10Giuseppe Lavagetto)
[10:40:45] <ema>	 _joe_: a few comments, plus the change fails to compile (https://puppet-compiler.wmflabs.org/4557/)
[10:40:56] <_joe_>	 yeah I know
[10:41:06] <_joe_>	 I already fixed _that_
[10:41:35] <_joe_>	 not your other comments
[10:41:42] <ema>	 k
[10:44:15] <ema>	 heh also when setting be_opts we sometimes use integers and other times strings for the port numbers
[10:44:35] <grrrit-wm>	 (03PS3) 10Gehel: elasticsearch - /etc/elasticsearch/scripts required for elasticsearch start up [puppet] - 10https://gerrit.wikimedia.org/r/320168 
[10:44:46] <icinga-wm>	 PROBLEM - puppet last run on relforge1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:45:46] <icinga-wm>	 RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[10:46:20] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] elasticsearch - /etc/elasticsearch/scripts required for elasticsearch start up [puppet] - 10https://gerrit.wikimedia.org/r/320168 (owner: 10Gehel)
[10:48:11] <grrrit-wm>	 (03PS3) 10Giuseppe Lavagetto: docker::registry: allow fetching images from the internet [puppet] - 10https://gerrit.wikimedia.org/r/320172 
[10:48:22] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: docker::registry: allow fetching images from the internet (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320172 (owner: 10Giuseppe Lavagetto)
[10:49:06] <moritzm>	 !log rebooting mw1017/mw1099 for kernel update
[10:49:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:50:24] <grrrit-wm>	 (03CR) 10Gehel: "puppet compiler seems to agree: https://puppet-compiler.wmflabs.org/4555/" [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[10:53:51] <grrrit-wm>	 (03PS4) 10Giuseppe Lavagetto: docker::registry: allow fetching images from the internet [puppet] - 10https://gerrit.wikimedia.org/r/320172 
[10:54:46] <icinga-wm>	 PROBLEM - puppet last run on ms-be1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:02:04] <grrrit-wm>	 (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/301192 (owner: 10Hashar)
[11:02:22] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] Maps - tilerator on all maps servers needs access to postgresql master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[11:07:27] <wikibugs>	 06Operations: update-ca-certificates, run via puppets sslcert module, doesn't update symlinks to replaced certificates - https://phabricator.wikimedia.org/T150058#2775860 (10akosiaris) @AlexMonk-WMF As @Joe said, we 've copied over the CA in production (2 months ago in fact). palladium has already been shutdown....
[11:23:46] <icinga-wm>	 RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[11:26:38] <wikibugs>	 06Operations, 10Prod-Kubernetes, 10Traffic, 05Kubernetes-production-experiment: Make our docker registry public - https://phabricator.wikimedia.org/T150168#2775898 (10Joe)
[11:27:05] <wikibugs>	 06Operations, 10Prod-Kubernetes, 10Traffic, 05Kubernetes-production-experiment: Make our docker registry public - https://phabricator.wikimedia.org/T150168#2775912 (10Joe) p:05Triage>03Normal
[11:27:18] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "I am not in love with the place of the file (/etc/ferm) but I get why it's proposed to be there and I have no better proposal. So +1" [puppet] - 10https://gerrit.wikimedia.org/r/319071 (owner: 10Muehlenhoff)
[11:27:23] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Better handling for one-hit-wonder objects - https://phabricator.wikimedia.org/T144187#2775914 (10Danielsberger) Ok, here are the new results for cache sizes between 50GB and 400GB. For now, I only looked at the Filter and Exp admission policies.  Disclaimer: The...
[11:27:26] <grrrit-wm>	 (03PS5) 10Giuseppe Lavagetto: docker::registry: allow fetching images from the internet [puppet] - 10https://gerrit.wikimedia.org/r/320172 (https://phabricator.wikimedia.org/T150168) 
[11:30:41] <grrrit-wm>	 (03CR) 10Ema: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/320172 (https://phabricator.wikimedia.org/T150168) (owner: 10Giuseppe Lavagetto)
[11:33:41] <moritzm>	 !log rebooting cassandra test hosts (cerium, praseodymium, xenon) for kernel update
[11:33:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:05] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: Add entry for docker-registry.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/320177 (https://phabricator.wikimedia.org/T150168) 
[11:35:36] <wikibugs>	 06Operations, 10Traffic: reimage cp4016 and cp1055 - https://phabricator.wikimedia.org/T149843#2775933 (10ema) 05Open>03Resolved a:03ema The hosts have been reimaged on 2016-11-02.
[11:37:46] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] ores (labs): Define log directory in worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/319984 (https://phabricator.wikimedia.org/T149925) (owner: 10Ladsgroup)
[11:37:53] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: ores (labs): Define log directory in worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/319984 (https://phabricator.wikimedia.org/T149925) (owner: 10Ladsgroup)
[11:37:56] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] ores (labs): Define log directory in worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/319984 (https://phabricator.wikimedia.org/T149925) (owner: 10Ladsgroup)
[11:40:39] <ema>	 !log cp3043: repool varnish-be and varnish-be-rand (T149881)
[11:40:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:40:46] <stashbot>	 T149881: varnish-be not restarting correctly because of disk space issues - https://phabricator.wikimedia.org/T149881
[11:40:56] <_joe_>	 I am waiting since 5 minutes for jenkins to catch up on a dns change
[11:42:46] <icinga-wm>	 PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:48:03] <grrrit-wm>	 (03PS1) 10Ema: cache_misc: use integers for port numbers [puppet] - 10https://gerrit.wikimedia.org/r/320179 
[11:52:25] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] Add entry for docker-registry.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/320177 (https://phabricator.wikimedia.org/T150168) (owner: 10Giuseppe Lavagetto)
[11:52:26] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2775954 (10elukey) Quick summary:   Brandon and Ema debugged the upload issue and figured out that it was related to the absence of a...
[11:56:09] <grrrit-wm>	 (03PS1) 10Ema: cache_text esams: route to codfw [puppet] - 10https://gerrit.wikimedia.org/r/320180 (https://phabricator.wikimedia.org/T131503) 
[11:57:17] <grrrit-wm>	 (03PS1) 10Ema: cache_text: upgrade esams to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/320181 (https://phabricator.wikimedia.org/T131503) 
[11:57:19] <grrrit-wm>	 (03PS6) 10Giuseppe Lavagetto: docker::registry: allow fetching images from the internet [puppet] - 10https://gerrit.wikimedia.org/r/320172 (https://phabricator.wikimedia.org/T150168) 
[11:59:22] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] docker::registry: allow fetching images from the internet [puppet] - 10https://gerrit.wikimedia.org/r/320172 (https://phabricator.wikimedia.org/T150168) (owner: 10Giuseppe Lavagetto)
[12:00:33] <moritzm>	 !log rebooting wtp1001 for kernel update
[12:00:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:10] <wikibugs>	 06Operations: eqiad: 1 hardware access request for labs on real hardware (mwoffliner) - https://phabricator.wikimedia.org/T117095#2775967 (10Kelson) @Andrew @RobH @chasemp @AlexMonk-WMF Thank you for taking time to answer to this ticket. I'm now back on this topic after a long summer pause.  Purpose: Create ZIM...
[12:02:46] <icinga-wm>	 PROBLEM - puppet last run on darmstadtium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:03:05] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: docker::registry: fix parser function call [puppet] - 10https://gerrit.wikimedia.org/r/320182 
[12:03:06] <_joe_>	 that's me ^^
[12:05:33] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] docker::registry: fix parser function call [puppet] - 10https://gerrit.wikimedia.org/r/320182 (owner: 10Giuseppe Lavagetto)
[12:06:46] <icinga-wm>	 RECOVERY - puppet last run on darmstadtium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[12:09:19] <jynus>	 !log performing schema change on s6 (imagelinks) T139090
[12:09:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:26] <stashbot>	 T139090: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090
[12:10:36] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2775972 (10elukey) From http://book.varnish-software.com/4.0/chapters/Tuning.html:  ``` Varnish operates with multiple pools of thread...
[12:11:46] <icinga-wm>	 RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[12:12:46] <icinga-wm>	 PROBLEM - puppet last run on darmstadtium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[nginx]
[12:14:44] <wikibugs>	 06Operations: Remote IPMI doens't work for ~17% of the fleet - https://phabricator.wikimedia.org/T150160#2775982 (10MoritzMuehlenhoff) Updating the host provisioning docs in combination with a daily Icinga check sounds like the best approach to me.
[12:19:17] <wikibugs>	 06Operations, 10OTRS: Intermittent 503 errors on OTRS ticket system when sending responses to tickets - https://phabricator.wikimedia.org/T148299#2775991 (10Josve05a) Hmm, seems to have been resolved (or maybe I'm not active enough on OTRS to catch it anymore), but ever since you'vve asked me to save the next...
[12:27:48] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2775995 (10ArielGlenn) This needs: reading by experts, lots of cleanup, simplification probably, plus see todos at t...
[12:28:35] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: docker::registry: wrap nginx directives in a location [puppet] - 10https://gerrit.wikimedia.org/r/320184 (https://phabricator.wikimedia.org/T150168) 
[12:31:57] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] docker::registry: wrap nginx directives in a location [puppet] - 10https://gerrit.wikimedia.org/r/320184 (https://phabricator.wikimedia.org/T150168) (owner: 10Giuseppe Lavagetto)
[12:38:19] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: docker::registry: return 403 instead of 405 as an error [puppet] - 10https://gerrit.wikimedia.org/r/320185 (https://phabricator.wikimedia.org/T150168) 
[12:40:10] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2776049 (10ArielGlenn) @demon We had talked about the difference in memory usage for cobalt and elasticsearch, how t...
[12:40:49] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] docker::registry: return 403 instead of 405 as an error [puppet] - 10https://gerrit.wikimedia.org/r/320185 (https://phabricator.wikimedia.org/T150168) (owner: 10Giuseppe Lavagetto)
[12:42:46] <icinga-wm>	 RECOVERY - puppet last run on darmstadtium is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[12:49:59] <hashar>	 jouncebot: neilpquinn 
[12:50:02] <hashar>	 jouncebot: next
[12:50:02] <jouncebot>	 In 1 hour(s) and 9 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T1400)
[12:50:04] <hashar>	 sorry 
[12:54:58] <Urbanecm>	 Hi everybody. How is $wgLocaltimezone set for cswiki? It seems it isn't good... It should be CET/CEST. It depends if summer time is now. 
[12:56:04] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Better handling for one-hit-wonder objects - https://phabricator.wikimedia.org/T144187#2776071 (10Danielsberger) [[ https://github.com/dasebe/libvmod-cacheadmission/blob/master/vcl/ExpLRU.vcl | Here ]]'s some VCL/inline c that implements the exp-size amission poli...
[12:56:08] <Nemo_bis>	 'cswiki' => 'Europe/Prague', // T73902
[12:56:09] <stashbot>	 T73902: Set timezone for cs.Wikipedia and cs.Wikinews - https://phabricator.wikimedia.org/T73902
[12:56:47] <Nemo_bis>	 Btw, this is a question for #wikimedia-tech
[12:57:06] <Urbanecm>	 Sorry. I'm in this channel, I'll ask for other related question there. 
[12:59:15] <wikibugs>	 06Operations, 10OTRS: Intermittent 503 errors on OTRS ticket system when sending responses to tickets - https://phabricator.wikimedia.org/T148299#2776074 (10akosiaris) 05Open>03Invalid @Josve05a Hm. Weird. Anyway, I am tentatively resolving as Invalid for now. Don't hesitate to reopen with a log if it ther...
[13:06:12] <grrrit-wm>	 (03PS5) 10Reedy: Add cronjob for regenerating captchas [puppet] - 10https://gerrit.wikimedia.org/r/319892 (https://phabricator.wikimedia.org/T150029) 
[13:06:36] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:36] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:46] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:47] <moritzm>	 !log rebooting scandium for kernel update
[13:06:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:56] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:56] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:06:56] <grrrit-wm>	 (03CR) 10Reedy: "PS5 adds --oldcaptcha" [puppet] - 10https://gerrit.wikimedia.org/r/319892 (https://phabricator.wikimedia.org/T150029) (owner: 10Reedy)
[13:06:57] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:06] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:06] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:06] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:16] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:16] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:26] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:07:26] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[13:08:46] <icinga-wm>	 PROBLEM - puppet last run on mw1268 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:13:16] <grrrit-wm>	 (03CR) 10Gehel: Maps - tilerator on all maps servers needs access to postgresql master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[13:13:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:13:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: x1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:13:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:13:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:13:36] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:13:37] <wikibugs>	 06Operations, 10Mobile-Content-Service, 07Service-deployment-requests, 06Services (watching): New Service Request for Trending Edits Service - https://phabricator.wikimedia.org/T150043#2776093 (10Fjalapeno)
[13:13:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:13:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:13:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:14:13] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] cache_misc: use integers for port numbers [puppet] - 10https://gerrit.wikimedia.org/r/320179 (owner: 10Ema)
[13:14:16] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:14:16] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag could not connect
[13:14:56] <bblack>	 jynus: ^ ?
[13:17:46] <wikibugs>	 06Operations, 10Wikimedia-General-or-Unknown: Extend capacity for video scalers - https://phabricator.wikimedia.org/T150067#2776098 (10Reedy) Sounds like a good plan. I guess a newer CPU generation or two is going to provide some reasonable gains to begin with
[13:17:56] <bblack>	 marostegui: ?
[13:19:28] <hashar>	 !log shutting down Nodepool (labnodepool1001.eqiad.wmnet reboot)
[13:19:30] <marostegui>	 bblack: not me, but let me check
[13:19:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:20:47] <wikibugs>	 06Operations, 10Mobile-Content-Service, 07Service-deployment-requests, 06Services (watching): New Service Request for Trending Edits Service - https://phabricator.wikimedia.org/T150043#2776101 (10Fjalapeno)
[13:22:06] <icinga-wm>	 PROBLEM - Disk space on graphite1002 is CRITICAL: DISK CRITICAL - free space: /boot 0 MB (0% inode=98%)
[13:22:40] <jynus>	 I do not know what it is
[13:22:50] <jynus>	 did it go down?
[13:22:52] <marostegui>	 No
[13:22:56] <marostegui>	 it is with too many connections
[13:23:00] <grrrit-wm>	 (03PS1) 10Urbanecm: Whitelisting domain for GWToolset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320190 (https://phabricator.wikimedia.org/T150167) 
[13:23:23] <jynus>	 the schema change?
[13:23:32] <marostegui>	 which schema change?
[13:23:40] <jynus>	 see sal
[13:23:49] <bblack>	 12:09 < jynus> !log performing schema change on s6 (imagelinks) T139090
[13:23:50] <stashbot>	 T139090: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090
[13:24:27] <jynus>	 but that should only affect s6
[13:24:37] <jynus>	 and it should have not production traffic
[13:25:26] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s5 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[13:25:26] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[13:25:26] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:25:26] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[13:25:36] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 87911.68 seconds
[13:25:36] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 87668.65 seconds
[13:25:37] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:25:46] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s4 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:25:47] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[13:25:48] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[13:25:48] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[13:25:48] <icinga-wm>	 RECOVERY - MariaDB Slave IO: x1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:25:48] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[13:25:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 1078.31 seconds
[13:25:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 87677.32 seconds
[13:25:48] <jynus>	 it does not have an extra port
[13:25:49] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 87665.32 seconds
[13:25:52] <jynus>	 like production
[13:25:56] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:25:56] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m2 on dbstore1001 is OK: OK slave_io_state not a slave
[13:25:57] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[13:25:59] <jynus>	 we need to fix that
[13:26:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[13:26:06] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:26:06] <icinga-wm>	 RECOVERY - Disk space on graphite1002 is OK: DISK OK
[13:26:06] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s5 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:26:16] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore1001 is OK: OK slave_sql_state not a slave
[13:26:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 80196.83 seconds
[13:26:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 80409.84 seconds
[13:26:16] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:26:25] <moritzm>	 !log rebooting labnodepool1001 for kernel update
[13:26:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 87589.95 seconds
[13:26:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: x1 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 976.94 seconds
[13:26:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m2 on dbstore1001 is OK: OK slave_sql_lag not a slave
[13:26:27] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s6 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[13:26:27] <jynus>	 Waiting for table metadata lock would explain it
[13:26:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:42] <marostegui>	 yes, it is still there indeed
[13:26:53] <jynus>	 but not why it happened
[13:26:59] <marostegui>	 I am checking the graphs and I do not see any spikes really (at least yet)
[13:27:09] <jynus>	 this database should have no traffic to justify a metadata lock pilup
[13:27:20] <jynus>	 unlike production
[13:29:36] <wikibugs>	 06Operations, 10Mobile-Content-Service, 07Service-deployment-requests, 06Services (watching): New Service Request for Trending Edits Service - https://phabricator.wikimedia.org/T150043#2776128 (10Fjalapeno)
[13:29:47] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2776129 (10ArielGlenn) >>! In T148478#2750773, @Dzahn wrote:  > `root@cobalt:~# java -XX:+PrintFlagsFinal -version |...
[13:32:06] <icinga-wm>	 PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:32:22] <jynus>	 maybe there is some dump process happening at the same time, I will check cron
[13:34:01] <hashar>	 !log Flushed nodepool instances. It is bringing up fresh one now.
[13:34:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:34:30] <jynus>	 nope, dumps are "0 1 * * 3"
[13:35:33] <jynus>	 Anyone here queried dbstore1001? It is not a problem, and it would give a reason for the issues- which right now are unknown
[13:35:46] <icinga-wm>	 RECOVERY - puppet last run on mw1268 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[13:36:32] <jynus>	 "alter table imagelinks DROP INDEX il_backlinks_namespace, ADD INDEX il_backlinks_namespace (il_from_namespace,il_to,il_from)" is running
[13:36:33] <gehel>	 !log reboot wdqs2* for kernel upgrade
[13:36:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:36:47] <jynus>	 but it requires other queries to create the metadata lock
[13:38:22] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 04-1] "I think I'd like to see this in the same commit that applies this (you mentioned a systemd unit?), as otherwise it's too much of a noop." [puppet] - 10https://gerrit.wikimedia.org/r/319071 (owner: 10Muehlenhoff)
[13:39:44] <grrrit-wm>	 (03PS3) 10Hashar: Enable wgAbuseFilterProfile at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319566 (https://phabricator.wikimedia.org/T149899) (owner: 10Urbanecm)
[13:39:46] <grrrit-wm>	 (03PS5) 10Hashar: Enable Extension:ShortURL on bd.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311656 (https://phabricator.wikimedia.org/T146014) (owner: 10MarcoAurelio)
[13:39:48] <grrrit-wm>	 (03PS4) 10Hashar: Rename 'autopatrol' to 'autopatrolled' on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308446 (https://phabricator.wikimedia.org/T144699) (owner: 10MarcoAurelio)
[13:39:50] <grrrit-wm>	 (03PS3) 10Hashar: Allow local sysops to add accountcreator group in fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319805 (https://phabricator.wikimedia.org/T149986) (owner: 10Urbanecm)
[13:39:52] <grrrit-wm>	 (03PS3) 10Hashar: Allow reviewers to stabilize pages in Finnish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319808 (https://phabricator.wikimedia.org/T149987) (owner: 10Urbanecm)
[13:39:54] <hashar>	 (rebased patches for SWAT)
[13:39:54] <grrrit-wm>	 (03PS3) 10Hashar: Enable $wgAbuseFilterProfile at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319569 (https://phabricator.wikimedia.org/T149901) (owner: 10MarcoAurelio)
[13:41:06] <icinga-wm>	 PROBLEM - puppet last run on es1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:41:19] <jynus>	 https://grafana.wikimedia.org/dashboard/db/mysql?var-dc=eqiad%20prometheus%2Fops&var-server=dbstore1001&from=1478523156994&to=1478525958122
[13:41:57] <Reedy>	 hashar: You'll have to rebase them again... :P
[13:42:00] <Reedy>	 After you start merging them
[13:42:28] <Urbanecm>	 jouncebot: next
[13:42:28] <jouncebot>	 In 0 hour(s) and 17 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T1400)
[13:42:44] <grrrit-wm>	 (03PS1) 10Thiemo Mättig (WMDE): Add missing $wgPropertySuggesterClassifyingPropertyIds for beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320192 
[13:42:50] <hashar>	 (having a break/coffee before swat)
[13:43:28] <hashar>	 Reedy: not sure why ?  once I get a full chain that is on the tip of the branch
[13:43:35] <hashar>	 I can just CR+2 each of them in whatever order
[13:43:40] <hashar>	 and they will eventually all land
[13:43:44] <hashar>	 brb
[13:43:45] <Reedy>	 I thought it prevented merge commits on the repo?
[13:44:06] <icinga-wm>	 RECOVERY - puppet last run on es1012 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[13:44:15] <hashar>	 Reedy: well they are fast forward right now :]
[13:44:36] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 04-1] Check whether ferm has been correctly started (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/318527 (https://phabricator.wikimedia.org/T148986) (owner: 10Muehlenhoff)
[13:49:56] <icinga-wm>	 PROBLEM - Host db2034 is DOWN: PING CRITICAL - Packet loss = 100%
[13:50:18] <paravoid>	 jynus, marostegui ^ ?
[13:50:31] <marostegui>	 paravoid: me, came back from downtime
[13:50:36] <arseny92>	 [15:43] <hashar> Reedy: not sure why ?  once I get a full chain that is on the tip of the branch ->> yes but one one is merged the branch becomes newer than the rest
[13:50:45] <marostegui>	 it has hardware issues
[13:50:47] <marostegui>	 :(
[13:51:58] <hashar>	 arseny92: that is handled
[13:52:00] <zeljkof>	 hashar: what's the plan with swat today? there is plenty of patches
[13:52:01] <bblack>	 !log depooling cp4018 nginx+varnish-fe services for debugging
[13:52:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:30] <hashar>	 zeljkof: going to deploy them all at once except the last config change that requires a script to be run
[13:53:39] <zeljkof>	 hashar: Ok, so you are doing swat today then?
[13:54:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 858.69 seconds
[13:54:56] <icinga-wm>	 RECOVERY - Host db2034 is UP: PING OK - Packet loss = 0%, RTA = 36.51 ms
[13:55:13] <hashar>	 zeljkof: yeah i will
[13:56:53] <gehel>	 !log reboot wdqs1* for kernel upgrade
[13:56:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:19] <marostegui>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 858.69 seconds -> that is the alter table probably (it is running)
[13:59:07] <grrrit-wm>	 (03PS1) 10Urbanecm: Enable Extension:ShortUrl for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320193 (https://phabricator.wikimedia.org/T150166) 
[13:59:34] <Urbanecm>	 hashar zeljkof Will it be possible to deploy nine patches? If it would I'll schedule it for another window. 
[13:59:45] <hashar>	 Urbanecm: yes 
[13:59:51] <wikibugs>	 06Operations, 10Monitoring: Huge log files on icinga machines - https://phabricator.wikimedia.org/T150061#2776176 (10Volans) @fgiunchedi @akosiaris @Joe  From the looks of it (I've take just a quick look, correct me if I'm wrong):   - Puppet spam is caused by a missing sorting in the generated file from `nagge...
[14:00:02] <Urbanecm>	 Okay. So I'm going to add it to the calendar. 
[14:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T1400).
[14:00:04] <jouncebot>	 yurik, Urbanecm, and mafk: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[14:00:06] <icinga-wm>	 RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[14:00:27] <hashar>	 yurik not around for https://gerrit.wikimedia.org/r/#/c/320160/ :/
[14:00:48] <Urbanecm>	 Added. 
[14:00:53] <hashar>	 will try
[14:01:52] <arseny92>	 320190 there's no aligning whitespaces to show it as table
[14:02:33] <arseny92>	 see all the other lines are aligned
[14:02:46] <Urbanecm>	 Going to add. Sorry
[14:03:02] <yurik>	 here
[14:03:37] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 04-1] "Hardcoding our expiry thresholds for either GlobalSign or Let's Encrypt all over the tree isn't a great idea. It makes it much harder to a" [puppet] - 10https://gerrit.wikimedia.org/r/313805 (https://phabricator.wikimedia.org/T144293) (owner: 10Alex Monk)
[14:03:57] <hashar>	 yurik: good morning! I have pulled the Kartographer patch on mw1099
[14:04:04] <hashar>	 yurik: but I have no idea how to verify it works fine
[14:04:16] <icinga-wm>	 PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:04:17] <yurik>	 hashar, testing...
[14:04:35] <grrrit-wm>	 (03PS2) 10Urbanecm: Whitelisting domain for GWToolset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320190 (https://phabricator.wikimedia.org/T150167) 
[14:04:39] <wikibugs>	 06Operations, 10ops-codfw, 06DC-Ops, 13Patch-For-Review, 07Wikimedia-Incident: Labstore2001 controller or shelf failure - https://phabricator.wikimedia.org/T102626#2776187 (10Papaul) @madhuvishyi rebuild the RAID you should be able to see all disk now on H800. Let me know if you have any other questions....
[14:05:24] <Urbanecm>	 arseny92: Fixed. 
[14:05:25] <arseny92>	 k good
[14:05:31] <Urbanecm>	 Thanks. 
[14:05:43] <yurik>	 hashar, works
[14:06:13] <wikibugs>	 06Operations, 10ops-eqiad, 10hardware-requests: Return wmf4747/wmf4748/wmf4749/wmf4750 to spares - https://phabricator.wikimedia.org/T146171#2776190 (10faidon) So it's unclear to me what the next step is and who should be acting on this now; is it @RobH or @Cmjohnson?
[14:07:47] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 031] "LGTM, but perhaps we should consider renaming the script then, as "reimage --new" is an oxymoron :)" [puppet] - 10https://gerrit.wikimedia.org/r/318304 (https://phabricator.wikimedia.org/T148816) (owner: 10Volans)
[14:08:41] <volans>	 paravoid: --no-re (in the sense of removing the 're' from the name) :-P ^^^
[14:08:49] <paravoid>	 hah
[14:09:33] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: mail: add an empty statement for 4.87+ compatibility [puppet] - 10https://gerrit.wikimedia.org/r/316956 
[14:09:48] <grrrit-wm>	 (03PS2) 10BBlack: remove debian perl ldflags patch [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320161 
[14:09:50] <grrrit-wm>	 (03PS2) 10BBlack: depend on lsb-base >= 3.0-6 [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320162 
[14:09:52] <grrrit-wm>	 (03PS5) 10BBlack: nginx (1.11.4-1+wmf14) jessie-wikimedia; urgency=medium [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/319776 
[14:09:54] <grrrit-wm>	 (03PS3) 10BBlack: add stapling-multi-file patch [software/nginx] (wmf-1.11.4) - 10https://gerrit.wikimedia.org/r/320115 
[14:10:12] <hashar>	 yurik: syncing
[14:10:35] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] mail: add an empty statement for 4.87+ compatibility [puppet] - 10https://gerrit.wikimedia.org/r/316956 (owner: 10Faidon Liambotis)
[14:10:49] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [V: 032] mail: add an empty statement for 4.87+ compatibility [puppet] - 10https://gerrit.wikimedia.org/r/316956 (owner: 10Faidon Liambotis)
[14:10:56] <logmsgbot>	 !log hashar@tin Synchronized php-1.29.0-wmf.1/extensions/Kartographer/extension.json: Fix monobook <maplink> (missing debounce dep) T145521 (duration: 00m 47s)
[14:11:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:02] <stashbot>	 T145521: <maplink> does not work in Monobook skin - https://phabricator.wikimedia.org/T145521
[14:13:14] <hashar>	 reviewing other patches
[14:14:04] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319566 (https://phabricator.wikimedia.org/T149899) (owner: 10Urbanecm)
[14:14:21] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319805 (https://phabricator.wikimedia.org/T149986) (owner: 10Urbanecm)
[14:14:34] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319808 (https://phabricator.wikimedia.org/T149987) (owner: 10Urbanecm)
[14:14:39] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable wgAbuseFilterProfile at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319566 (https://phabricator.wikimedia.org/T149899) (owner: 10Urbanecm)
[14:15:03] <grrrit-wm>	 (03Merged) 10jenkins-bot: Allow local sysops to add accountcreator group in fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319805 (https://phabricator.wikimedia.org/T149986) (owner: 10Urbanecm)
[14:15:12] <grrrit-wm>	 (03Merged) 10jenkins-bot: Allow reviewers to stabilize pages in Finnish Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319808 (https://phabricator.wikimedia.org/T149987) (owner: 10Urbanecm)
[14:15:29] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320190 (https://phabricator.wikimedia.org/T150167) (owner: 10Urbanecm)
[14:15:56] <icinga-wm>	 PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:16:30] <hashar>	 bah
[14:16:56] <grrrit-wm>	 (03PS3) 10Hashar: Whitelisting domain for GWToolset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320190 (https://phabricator.wikimedia.org/T150167) (owner: 10Urbanecm)
[14:17:07] <grrrit-wm>	 (03CR) 10Hashar: Whitelisting domain for GWToolset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320190 (https://phabricator.wikimedia.org/T150167) (owner: 10Urbanecm)
[14:17:12] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Whitelisting domain for GWToolset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320190 (https://phabricator.wikimedia.org/T150167) (owner: 10Urbanecm)
[14:17:43] <grrrit-wm>	 (03Merged) 10jenkins-bot: Whitelisting domain for GWToolset [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320190 (https://phabricator.wikimedia.org/T150167) (owner: 10Urbanecm)
[14:18:24] <arseny92>	 https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=958951&oldid=958914
[14:18:26] <hashar>	 I got confused at some point
[14:19:10] <arseny92>	 311656 has dependson
[14:19:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 158.70 seconds
[14:19:38] <arseny92>	 ehich is only noted on gerrit
[14:19:52] <grrrit-wm>	 (03PS2) 10Ema: cache_misc: use integers for port numbers [puppet] - 10https://gerrit.wikimedia.org/r/320179 
[14:19:58] <grrrit-wm>	 (03CR) 10Ema: [C: 032 V: 032] cache_misc: use integers for port numbers [puppet] - 10https://gerrit.wikimedia.org/r/320179 (owner: 10Ema)
[14:20:15] <Urbanecm>	 hashar: At which point exactly? 
[14:20:30] <arseny92>	 why there's no consistency when one adds stuff to deploy and not note the tasks or epending commits
[14:20:35] <hashar>	 by a rename in a gerrit change
[14:20:37] <hashar>	 not a big deal
[14:20:50] <hashar>	 I have pushed on mw1099 the first four changes by Urbanecm 
[14:21:04] <Urbanecm>	 Okay. So I'm going to test them. I'll let you know. 
[14:21:08] <Urbanecm>	 hashar: ^
[14:21:24] <arseny92>	 hashar be sure to note the tasks in log when you sync )
[14:21:34] <arseny92>	 per ^
[14:21:44] <arseny92>	 https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=958951&oldid=958914
[14:23:46] <Urbanecm>	 hashar: 319805 can be deployed
[14:24:00] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "The extensions requires a database schema change. Eg loading schemas/shorturls.sql and really I have no idea how we are handling them." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320193 (https://phabricator.wikimedia.org/T150166) (owner: 10Urbanecm)
[14:24:22] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: RESTBase: Add baseUriTemplate parameter. [puppet] - 10https://gerrit.wikimedia.org/r/319897 (owner: 10Ppchelko)
[14:24:56] <Urbanecm>	 hashar: Why C-1? My change is wrong someway? I already requested enabling new extension which almost always needs new tables... I'll try to find something after testing. 
[14:24:56] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore2001 is OK: OK slave_sql_state not a slave
[14:24:56] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m2 on dbstore2001 is OK: OK slave_io_state not a slave
[14:25:06] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m3 on dbstore2001 is OK: OK slave_io_state not a slave
[14:25:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s5 on dbstore2001 is OK: OK slave_sql_state not a slave
[14:25:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore2001 is OK: OK slave_sql_state not a slave
[14:25:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m3 on dbstore2001 is OK: OK slave_sql_state not a slave
[14:25:06] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on dbstore2001 is OK: OK slave_io_state not a slave
[14:25:07] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on dbstore2001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional)
[14:25:13] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "The extension requires a database schema change. Eg loading schemas/shorturls.sql and really I have no idea how we are handling them. Skip" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311656 (https://phabricator.wikimedia.org/T146014) (owner: 10MarcoAurelio)
[14:25:16] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore2001 is OK: OK slave_io_state not a slave
[14:25:16] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore2001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[14:25:16] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore2001 is OK: OK slave_sql_state not a slave
[14:25:16] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore2001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[14:25:16] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore2001 is OK: OK slave_sql_state not a slave
[14:25:17] <icinga-wm>	 RECOVERY - MariaDB Slave IO: x1 on dbstore2001 is OK: OK slave_io_state not a slave
[14:25:17] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s6 on dbstore2001 is OK: OK slave_io_state not a slave
[14:25:18] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on dbstore2001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional)
[14:25:26] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s4 on dbstore2001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional)
[14:25:26] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s5 on dbstore2001 is OK: OK slave_io_state not a slave
[14:25:29] <Urbanecm>	 hashar: 319808 can be deployed too
[14:25:36] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore2001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[14:25:36] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore2001 is OK: OK slave_sql_state not a slave
[14:25:43] <grrrit-wm>	 (03PS4) 10Hashar: Enable $wgAbuseFilterProfile at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319569 (https://phabricator.wikimedia.org/T149901) (owner: 10MarcoAurelio)
[14:26:04] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Enable $wgAbuseFilterProfile at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319569 (https://phabricator.wikimedia.org/T149901) (owner: 10MarcoAurelio)
[14:26:29] <arseny92>	 hashar , by using the wikimediamaintenance extension scripts stuff etc iirc
[14:26:50] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable $wgAbuseFilterProfile at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319569 (https://phabricator.wikimedia.org/T149901) (owner: 10MarcoAurelio)
[14:26:50] <hashar>	 going to add  [config] 319569 Enable $wgAbuseFilterProfile at Meta-Wiki
[14:27:03] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] RESTBase: Add baseUriTemplate parameter. [puppet] - 10https://gerrit.wikimedia.org/r/319897 (owner: 10Ppchelko)
[14:27:05] <hashar>	 the couple changes that enable shortUrl, i got to do the schema change
[14:27:10] <hashar>	 so holding for now
[14:28:24] <arseny92>	 hashar , by using the wikimediamaintenance extension scripts stuff etc iirc
[14:28:38] <arseny92>	 to create tables for new extensions
[14:28:39] <hashar>	 Urbanecm: guess I will deploy all five changes in one go :}
[14:28:53] <Urbanecm>	 hashar: I'm only reporting the progress :)
[14:28:55] <_joe_>	 mobrovac: running puppet on the restbase hosts
[14:29:20] <Urbanecm>	 320190 is untestable at mw1099 because I have no rights to try it. 
[14:29:56] <Urbanecm>	 319566 can be deployed. 
[14:30:01] <hashar>	 Urbanecm: I am syncing them all
[14:30:06] <hashar>	 looks fine to me
[14:30:15] <Urbanecm>	 hashar: All except 320190 was tested by me and 320190 is untestable for me. 
[14:30:50] <logmsgbot>	 !log hashar@tin Synchronized wmf-config: (no message) (duration: 00m 53s)
[14:30:52] <grrrit-wm>	 (03CR) 10Ottomata: [C: 031] confluent::kafka::mirror::jmxtrans: key attr is declared more than once [puppet] - 10https://gerrit.wikimedia.org/r/319770 (owner: 10Dzahn)
[14:30:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:31:02] <arseny92>	 hashar be sure to note the tasks in log when you sync )
[14:31:06] <arseny92>	 uh
[14:31:14] <hashar>	 just assume the patches got deployed and close tasks
[14:31:28] <hashar>	 OK [config] 319566 Enable wgAbuseFilterProfile at cswiki
[14:31:28] <hashar>	 OK [config] 319805 Allow local sysops to add accountcreator group in fiwiki
[14:31:30] <hashar>	 OK [config] 319808 Allow reviewers to stabilize pages in Finnish Wikipedia
[14:31:30] <hashar>	 OK [config] 320190 CopyUploadsDomain addition
[14:31:32] <hashar>	 OK [config] 319569 Enable $wgAbuseFilterProfile at Meta-Wiki
[14:32:07] <Urbanecm>	 Thanks hashar. I'll close them. About the database changes, maybe https://wikitech.wikimedia.org/wiki/How_to_do_a_schema_change can help?
[14:32:09] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Load connection tracking sysctl values via a separate systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/320197 (https://phabricator.wikimedia.org/T136094) 
[14:32:16] <icinga-wm>	 RECOVERY - puppet last run on ms-be1013 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[14:32:40] <grrrit-wm>	 (03CR) 10Hashar: "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308446 (https://phabricator.wikimedia.org/T144699) (owner: 10MarcoAurelio)
[14:33:43] <gehel>	 !log reboot maps-test* for kernel upgrade
[14:33:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:53] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Load connection tracking sysctl values via a separate systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/320197 (https://phabricator.wikimedia.org/T136094) (owner: 10Muehlenhoff)
[14:34:14] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2776282 (10Ottomata) The various myspell* packages were requested in T99030 and T121011.  @halfak can comment as to whether they are still needed.  zpubsub was create...
[14:34:50] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] Rename 'autopatrol' to 'autopatrolled' on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308446 (https://phabricator.wikimedia.org/T144699) (owner: 10MarcoAurelio)
[14:35:06] <arseny92>	 https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change
[14:35:30] <Urbanecm>	 hashar: Closed. 
[14:35:35] <arseny92>	 creating new tables is not schema change
[14:35:39] <grrrit-wm>	 (03CR) 10Gehel: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/319875 (owner: 10Muehlenhoff)
[14:36:19] <arseny92>	 wikimedia maintenance ext script is used iirc for new tables creation
[14:37:22] <grrrit-wm>	 (03PS5) 10Hashar: Rename 'autopatrol' to 'autopatrolled' on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308446 (https://phabricator.wikimedia.org/T144699) (owner: 10MarcoAurelio)
[14:37:41] <grrrit-wm>	 (03CR) 10Hashar: Rename 'autopatrol' to 'autopatrolled' on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308446 (https://phabricator.wikimedia.org/T144699) (owner: 10MarcoAurelio)
[14:37:47] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308446 (https://phabricator.wikimedia.org/T144699) (owner: 10MarcoAurelio)
[14:38:27] <grrrit-wm>	 (03Merged) 10jenkins-bot: Rename 'autopatrol' to 'autopatrolled' on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/308446 (https://phabricator.wikimedia.org/T144699) (owner: 10MarcoAurelio)
[14:39:02] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Load connection tracking sysctl values via a separate systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/320197 (https://phabricator.wikimedia.org/T136094) 
[14:40:12] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Rename 'autopatrol' to 'autopatrolled' on fawiki - T144699 T139246 (duration: 00m 47s)
[14:40:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:19] <stashbot>	 T144699: Rename 'autopatrol' to 'autopatrolled' on fawiki - https://phabricator.wikimedia.org/T144699
[14:40:19] <stashbot>	 T139246: Migrate local group names to WikimediaMessages - https://phabricator.wikimedia.org/T139246
[14:41:24] <hashar>	 log fawiki: renaming user group 'autopatrol' to 'autopatrolled' for T139246 and T144699 with:  mwscript migrateUserGroup.php --wiki=fawiki 'autopatrol' 'autopatrolled'   
[14:42:34] <hashar>	 !log fawiki Done! 417 users in group 'autopatrol' are now in 'autopatrolled' instead.
[14:42:38] <hashar>	 !log fawiki: renaming user group 'autopatrol' to 'autopatrolled' for T139246 and T144699 with:  mwscript migrateUserGroup.php --wiki=fawiki 'autopatrol' 'autopatrolled'   
[14:42:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:44] <grrrit-wm>	 (03PS2) 10Volans: wmf-auto-reimage: add option --new for new hosts [puppet] - 10https://gerrit.wikimedia.org/r/318304 (https://phabricator.wikimedia.org/T148816) 
[14:43:56] <icinga-wm>	 RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[14:44:26] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 214, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-3/3/2: down - BR
[14:44:52] <grrrit-wm>	 (03PS1) 10KartikMistry: Explicitly set cookieDomain for ContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320200 (https://phabricator.wikimedia.org/T149879) 
[14:45:26] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 234, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-3/3/2: down - BR
[14:46:36] <hashar>	 doing the shortUrls changes now
[14:46:48] <grrrit-wm>	 (03PS2) 10Hashar: Enable Extension:ShortUrl for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320193 (https://phabricator.wikimedia.org/T150166) (owner: 10Urbanecm)
[14:46:50] <grrrit-wm>	 (03PS6) 10Hashar: Enable Extension:ShortURL on bd.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311656 (https://phabricator.wikimedia.org/T146014) (owner: 10MarcoAurelio)
[14:47:34] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320193 (https://phabricator.wikimedia.org/T150166) (owner: 10Urbanecm)
[14:47:45] <Urbanecm>	 hashar: Okay. 
[14:48:21] <grrrit-wm>	 (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311656 (https://phabricator.wikimedia.org/T146014) (owner: 10MarcoAurelio)
[14:48:36] <icinga-wm>	 PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:48:53] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable Extension:ShortURL on bd.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311656 (https://phabricator.wikimedia.org/T146014) (owner: 10MarcoAurelio)
[14:49:00] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable Extension:ShortUrl for tcywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320193 (https://phabricator.wikimedia.org/T150166) (owner: 10Urbanecm)
[14:49:21] <Reedy>	 https://gerrit.wikimedia.org/r/320202 to add ShortUrl to createExtensionTables :P
[14:49:25] <hashar>	 !log terbium: scap pull to add shortUrl tables to bdwikimedia and tcywiki
[14:49:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:40] <hashar>	 eekkk
[14:52:42] <grrrit-wm>	 (03CR) 10Volans: [C: 032] wmf-auto-reimage: add option --new for new hosts [puppet] - 10https://gerrit.wikimedia.org/r/318304 (https://phabricator.wikimedia.org/T148816) (owner: 10Volans)
[14:52:48] <grrrit-wm>	 (03PS1) 10ArielGlenn: add local cruft to .gitignore [dumps] - 10https://gerrit.wikimedia.org/r/320204 
[14:53:57] <wikibugs>	 06Operations, 10Mobile-Content-Service, 07Service-deployment-requests, 06Services (watching): New Service Request for Trending Edits Service - https://phabricator.wikimedia.org/T150043#2776362 (10Fjalapeno)
[14:54:39] <wikibugs>	 06Operations, 10Mobile-Content-Service, 07Service-deployment-requests, 06Services (watching): New Service Request for Trending Edits Service - https://phabricator.wikimedia.org/T150043#2772430 (10Fjalapeno)
[14:54:56] <icinga-wm>	 PROBLEM - puppet last run on analytics1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:55:25] <grrrit-wm>	 (03CR) 10Muehlenhoff: "My systemd unit-based approach is at https://gerrit.wikimedia.org/r/#/c/320197/, but I'll look into the ferm-internal post hook. I didn't " [puppet] - 10https://gerrit.wikimedia.org/r/319071 (owner: 10Muehlenhoff)
[14:56:08] <arseny92>	 Urbanecm you're in charge of your own workboard so I not touching it (T150166)
[14:56:08] <stashbot>	 T150166: Create short URL for Tulu (tcy) Wikipedia - https://phabricator.wikimedia.org/T150166
[14:57:24] <Urbanecm>	 arseny92: I don't use any Done column in my workboard. All is ok :)
[14:57:55] <hashar>	 going to try to do the schema change :D
[14:58:28] <arseny92>	 uh
[14:59:17] <Reedy>	 hashar: mwscript sql.php --wiki=whatever php-1.29.0-wmf.1/extensions/ShortUrl/schemas/shorturl.sql
[14:59:35] <Reedy>	 Or do you need to cat |
[14:59:54] <Reedy>	 either works :)
[15:00:57] <mark>	 !log Deactivate cr1-eqiad BGP peering with pfw1-eqiad
[15:01:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:17] <wikibugs>	 06Operations, 10Mobile-Content-Service, 07Service-deployment-requests, 06Services (watching): New Service Request for Trending Edits Service - https://phabricator.wikimedia.org/T150043#2776390 (10Fjalapeno)
[15:01:32] <hashar>	 !log T150166 mwscript sql.php --wiki=tcywiki /srv/mediawiki/php-1.29.0-wmf.1/extensions/ShortUrl/schemas/shorturls.sql
[15:01:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:37] <stashbot>	 T150166: Create short URL for Tulu (tcy) Wikipedia - https://phabricator.wikimedia.org/T150166
[15:01:55] <hashar>	 !log T146014 mwscript sql.php --wiki=bdwikimedia /srv/mediawiki/php-1.29.0-wmf.1/extensions/ShortUrl/schemas/shorturls.sql
[15:02:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:03] <stashbot>	 T146014: Enable Extension:ShortUrl on chapterwiki of WMBD - https://phabricator.wikimedia.org/T146014
[15:02:13] <moritzm>	 !log rebooting mw1261-mw1265 (canary app servers) for kernel update
[15:02:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:37] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] add local cruft to .gitignore [dumps] - 10https://gerrit.wikimedia.org/r/320204 (owner: 10ArielGlenn)
[15:02:50] <hashar>	 !log T150166 mwscript extensions/ShortUrl/populateShortUrlTable.php --wiki=tcywiki  (1569 titles done)
[15:02:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:03:03] <hashar>	 !log T146014 mwscript extensions/ShortUrl/populateShortUrlTable.php --wiki=bdwikimedia (714 titles done)
[15:03:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:04] <hashar>	 ah finally
[15:04:11] <hashar>	 got the shortUrl enabled on mw1099
[15:04:56] <wikibugs>	 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2776402 (10Ottomata) Sounds perfect, thank you.
[15:05:12] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: docker::registry: add monitoring [puppet] - 10https://gerrit.wikimedia.org/r/320206 
[15:05:18] <grrrit-wm>	 (03CR) 10ArielGlenn: "Well after a ridiculously long time I am ready to make this happen. We need to think about where the defaults file should live (hardcoded" [dumps] - 10https://gerrit.wikimedia.org/r/43156 (owner: 10Awight)
[15:05:44] <grrrit-wm>	 (03CR) 10Gehel: Maps - tilerator on all maps servers needs access to postgresql master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[15:05:53] <mark>	 !log Chris moved cr1-eqiad:xe-5/0/3 to xe-3/3/2
[15:05:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:01] <mark>	 !log Reactivate cr1-eqiad BGP peering with pfw1-eqiad
[15:07:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:30] <marostegui>	 !log Enabling gtid_domain_id db1020 (m2 master) - T149418
[15:07:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:35] <stashbot>	 T149418: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418
[15:08:27] <hashar>	 Urbanecm: arseny92 Reedy took me a while but I got shortUrl added ;}
[15:08:38] <mark>	 !log Deactivate cr2-eqiad BGP peering with pfw1-eqiad
[15:08:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:46] <Urbanecm>	 Good! All is completed at this time?
[15:08:52] <Reedy>	 hashar: the patch I made to wikimediamaintenance will make it easier for future :)
[15:09:20] <hashar>	 maybe we should just enable shortUrl everywhere ?
[15:09:22] <logmsgbot>	 !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: shortUrl for bdwikimedia and tcywiki T146014 and T150166 (duration: 01m 51s)
[15:09:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:28] <stashbot>	 T146014: Enable Extension:ShortUrl on chapterwiki of WMBD - https://phabricator.wikimedia.org/T146014
[15:09:28] <stashbot>	 T150166: Create short URL for Tulu (tcy) Wikipedia - https://phabricator.wikimedia.org/T150166
[15:09:35] <arseny92>	 Reedy , maybe patch it to support every extension that we run?
[15:09:41] <Reedy>	 arseny92: Why?
[15:10:01] <Reedy>	 Most extensions that we enable on demand are in there
[15:10:10] <Reedy>	 The rest should be added via addWiki at creation
[15:10:14] <hashar>	 !log European SWAT completed
[15:10:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:42] <arseny92>	 Reedy: because of those make it easier for future cases
[15:11:41] <Reedy>	 I think most of them are there
[15:13:19] <mark>	 !log Chris moved cr2-eqiad:xe-5/0/3 to xe-3/3/2
[15:13:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:14:05] <mark>	 !log Reactivate cr2-eqiad BGP peering with pfw1-eqiad
[15:14:10] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] hhvm::admin: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/304476 (owner: 10Muehlenhoff)
[15:14:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:15:30] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] docker::registry: add monitoring [puppet] - 10https://gerrit.wikimedia.org/r/320206 (owner: 10Giuseppe Lavagetto)
[15:16:49] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: Maps - tilerator on all maps servers needs access to postgresql master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[15:18:01] <grrrit-wm>	 (03PS1) 10Marostegui: Repool db2042 - the maintenance is post poned as db2034 has hardware issues and cannot even receive all the data (T149553#2776069) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320207 
[15:18:16] <grrrit-wm>	 (03PS2) 10Marostegui: Repool db2042 - the maintenance is post poned as db2034 has hardware issues and cannot even receive all the data (T149553#2776069) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320207 
[15:18:18] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2776465 (10Halfak) The myspell packages are still needed.  If some aren't available, we can replace them with the best available aspell or hunspell packages.
[15:18:26] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0
[15:18:37] <icinga-wm>	 RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[15:18:45] <grrrit-wm>	 (03CR) 10Jcrespo: "+1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320207 (owner: 10Marostegui)
[15:19:23] <hashar>	 !log Restarting Jenkins (deadlock in beta cluster Jenkins jobs)
[15:19:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:20:28] <wikibugs>	 06Operations, 10Monitoring: Huge log files on icinga machines - https://phabricator.wikimedia.org/T150061#2776474 (10Joe) >>! In T150061#2776176, @Volans wrote: > @fgiunchedi @akosiaris @Joe  > From the looks of it (I've take just a quick look, correct me if I'm wrong): >   - Puppet spam is caused by a missing...
[15:20:34] <grrrit-wm>	 (03CR) 10Marostegui: [C: 032] Repool db2042 - the maintenance is post poned as db2034 has hardware issues and cannot even receive all the data (T149553#2776069) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320207 (owner: 10Marostegui)
[15:21:25] <grrrit-wm>	 (03Merged) 10jenkins-bot: Repool db2042 - the maintenance is post poned as db2034 has hardware issues and cannot even receive all the data (T149553#2776069) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320207 (owner: 10Marostegui)
[15:21:26] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0
[15:22:47] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2042 - T149553 (duration: 00m 49s)
[15:22:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:22:53] <stashbot>	 T149553: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553
[15:23:49] <wikibugs>	 06Operations, 10Monitoring: Huge log files on icinga machines - https://phabricator.wikimedia.org/T150061#2776482 (10akosiaris) >>! In T150061#2776176, @Volans wrote: > @fgiunchedi @akosiaris @Joe  > From the looks of it (I've take just a quick look, correct me if I'm wrong): >   - Puppet spam is caused by a m...
[15:23:59] <icinga-wm>	 RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[15:26:34] <elukey>	 !log rebooting kafka1013 for kernel upgrades
[15:26:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:19] <grrrit-wm>	 (03PS1) 10ArielGlenn: no bare exceptions; all exceptions use "except blah as ex" [dumps] - 10https://gerrit.wikimedia.org/r/320208 
[15:29:09] <grrrit-wm>	 (03PS3) 10Gehel: Maps - tilerator on all maps servers needs access to postgresql master [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) 
[15:30:55] <grrrit-wm>	 (03CR) 10Gehel: Maps - tilerator on all maps servers needs access to postgresql master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[15:31:03] <mark>	 !log Disabling OSPF/OSPF3 on cr2-codfw:xe-5/0/1 for eqiad side port move
[15:31:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:31:26] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] Maps - tilerator on all maps servers needs access to postgresql master [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[15:33:21] <grrrit-wm>	 (03PS1) 10ArielGlenn: maxint -> maxsize (python 3 fix) [dumps] - 10https://gerrit.wikimedia.org/r/320209 
[15:38:20] <mark>	 !log Reenabling OSPF/OSPF3 on cr2-codfw:xe-5/0/1 after eqiad side port move to xe-3/2/3
[15:38:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:39:36] <grrrit-wm>	 (03CR) 10Elukey: "https://grafana.wikimedia.org/dashboard/db/kafka?panelId=34 shows some good examples about how nf_conntrack varies over time. Usually it " [puppet] - 10https://gerrit.wikimedia.org/r/319071 (owner: 10Muehlenhoff)
[15:39:52] <moritzm>	 !log rebooting radium for kernel update
[15:39:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:40] <icinga-wm>	 PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties
[15:43:11] <wikibugs>	 06Operations, 10Monitoring: Huge log files on icinga machines - https://phabricator.wikimedia.org/T150061#2776531 (10Joe) Ok, after a few puppet runs it seems clear to me that ordering is not constant anymore when coming from puppetdb.  I will try to set the order in the puppetdb query first.
[15:43:14] <elukey>	 ah snap
[15:43:40] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:44:00] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:44:19] <elukey>	 !log started kafka-mirror-main-eqiad_to_analytics.service on kafka1012
[15:44:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:44:40] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[15:44:40] <icinga-wm>	 RECOVERY - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties
[15:46:21] <grrrit-wm>	 (03PS1) 10ArielGlenn: use sorted() everywhere instead of sort() (ptyhon 3 change) [dumps] - 10https://gerrit.wikimedia.org/r/320211 
[15:47:40] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:50:00] <wikibugs>	 06Operations, 10ops-eqiad, 10hardware-requests: Return wmf4747/wmf4748/wmf4749/wmf4750 to spares - https://phabricator.wikimedia.org/T146171#2776552 (10RobH) a:03RobH
[15:50:58] <grrrit-wm>	 (03PS2) 10Mark Bergsma: Reflect new FPC3 ports after cr1-/cr2-eqiad FPC5 decommissioning [dns] - 10https://gerrit.wikimedia.org/r/319617 
[15:52:00] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[15:52:40] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[15:54:35] <wikibugs>	 06Operations, 10netops: HTCP purges flood across CODFW - https://phabricator.wikimedia.org/T133387#2776585 (10akosiaris) FTR, this still holds true today. There isn't really any reason it should have been fixed, just noting it.
[15:54:44] <mark>	 !log Disabling cr2-eqiad BGP groups IX4/IX6 (all Equinix Ashburn BGP sessions)
[15:54:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:55:10] <grrrit-wm>	 (03Abandoned) 10ArielGlenn: Fixed PEP-8 issues [dumps] - 10https://gerrit.wikimedia.org/r/207504 (owner: 10Dereckson)
[15:57:53] <grrrit-wm>	 (03PS2) 10ArielGlenn: comment cleanup [dumps] - 10https://gerrit.wikimedia.org/r/207712 (owner: 10Dereckson)
[16:00:25] <mark>	 !log Chris moved cr2-eqiad:xe-5/3/3 to xe-3/3/3
[16:00:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:00:57] <grrrit-wm>	 (03PS5) 10Ottomata: eventstreams puppetization [puppet] - 10https://gerrit.wikimedia.org/r/317981 (https://phabricator.wikimedia.org/T148779) 
[16:01:45] <mark>	 !log Reactivated cr2-eqiad IX6 BGP group (ipv6 sessions)
[16:01:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:02:10] <icinga-wm>	 PROBLEM - puppet last run on db1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:03:27] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] eventstreams puppetization [puppet] - 10https://gerrit.wikimedia.org/r/317981 (https://phabricator.wikimedia.org/T148779) (owner: 10Ottomata)
[16:04:21] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] use sorted() everywhere instead of sort() (ptyhon 3 change) [dumps] - 10https://gerrit.wikimedia.org/r/320211 (owner: 10ArielGlenn)
[16:07:22] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] no bare exceptions; all exceptions use "except blah as ex" [dumps] - 10https://gerrit.wikimedia.org/r/320208 (owner: 10ArielGlenn)
[16:07:45] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] maxint -> maxsize (python 3 fix) [dumps] - 10https://gerrit.wikimedia.org/r/320209 (owner: 10ArielGlenn)
[16:08:39] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] comment cleanup [dumps] - 10https://gerrit.wikimedia.org/r/207712 (owner: 10Dereckson)
[16:11:26] <grrrit-wm>	 (03CR) 10ArielGlenn: "these scripts look a lot different (now that the production scripts are in master branch), I need to see if they are still sh only syntax " [dumps] - 10https://gerrit.wikimedia.org/r/207694 (owner: 10Dereckson)
[16:13:16] <grrrit-wm>	 (03CR) 10Gilles: "I see that firejail has support for rlimits. It might be what's getting in the way and we can try to use that instead." [puppet] - 10https://gerrit.wikimedia.org/r/319802 (https://phabricator.wikimedia.org/T145878) (owner: 10Gilles)
[16:20:40] <icinga-wm>	 PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp2013_v6, cp2016_v6
[16:20:50] <icinga-wm>	 PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp2013_v6, cp2016_v6
[16:20:50] <icinga-wm>	 PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp2013_v6, cp2016_v6
[16:20:50] <icinga-wm>	 PROBLEM - IPsec on cp3008 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2018_v6
[16:20:50] <icinga-wm>	 PROBLEM - IPsec on cp3045 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:20:50] <icinga-wm>	 PROBLEM - IPsec on cp1071 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:20:51] <icinga-wm>	 PROBLEM - IPsec on cp3035 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:00] <icinga-wm>	 PROBLEM - IPsec on cp1059 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2015_v6
[16:21:00] <icinga-wm>	 PROBLEM - IPsec on cp1060 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2015_v6
[16:21:00] <icinga-wm>	 PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp2013_v6, cp2016_v6
[16:21:00] <icinga-wm>	 PROBLEM - IPsec on cp3044 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:10] <icinga-wm>	 PROBLEM - IPsec on cp1062 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:21:10] <icinga-wm>	 PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 142 connecting: cp2013_v6, cp2014_v6, cp2015_v6, cp2016_v6, cp2017_v6, cp2018_v6
[16:21:10] <icinga-wm>	 PROBLEM - IPsec on cp2017 is CRITICAL: Strongswan CRITICAL - ok: 35 connecting: (unnamed), cp1048_v6, cp1049_v6, cp1050_v6, cp1062_v6, cp1063_v6, cp1064_v6, cp1071_v6, cp1072_v6, cp1073_v6, cp1074_v6, cp1099_v6,kafka1012_v6,kafka1014_v6,kafka1020_v6,kafka1022_v6 not-conn: cp3034_v6, cp3035_v6, cp3036_v6, cp3037_v6, cp3038_v6, cp3039_v6, cp3044_v6, cp3045_v6, cp3046_v6, cp3047_v6, cp3048_v6, cp3049_v6, cp4005_v6, cp4006_v6, cp40
[16:21:10] <icinga-wm>	 PROBLEM - IPsec on kafka1020 is CRITICAL: Strongswan CRITICAL - ok: 142 connecting: cp2013_v6, cp2014_v6, cp2015_v6, cp2016_v6, cp2017_v6, cp2018_v6
[16:21:10] <icinga-wm>	 PROBLEM - IPsec on cp1051 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2018_v6
[16:21:11] <icinga-wm>	 PROBLEM - IPsec on cp4011 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2015_v6
[16:21:11] <icinga-wm>	 PROBLEM - IPsec on cp4002 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2018_v6
[16:21:12] <icinga-wm>	 PROBLEM - IPsec on cp4020 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2015_v6
[16:21:12] <icinga-wm>	 PROBLEM - IPsec on cp4001 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2018_v6
[16:21:13] <icinga-wm>	 PROBLEM - IPsec on cp4012 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2015_v6
[16:21:13] <icinga-wm>	 PROBLEM - IPsec on cp4004 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2018_v6
[16:21:14] <icinga-wm>	 PROBLEM - IPsec on cp4019 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2015_v6
[16:21:14] <icinga-wm>	 PROBLEM - IPsec on cp4007 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:20] <icinga-wm>	 PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:21:20] <icinga-wm>	 PROBLEM - IPsec on cp2014 is CRITICAL: Strongswan CRITICAL - ok: 35 connecting: (unnamed), cp1048_v6, cp1049_v6, cp1050_v6, cp1062_v6, cp1063_v6, cp1064_v6, cp1071_v6, cp1072_v6, cp1073_v6, cp1074_v6, cp1099_v6, cp3034_v6, cp3035_v6, cp3038_v6, cp3039_v6, cp3045_v6, cp3047_v6, cp4005_v6, cp4006_v6, cp4013_v6, cp4014_v6, cp4015_v6,kafka1012_v6,kafka1014_v6,kafka1020_v6,kafka1022_v6 not-conn: cp3036_v6, cp3037_v6, cp3044_v6, cp30
[16:21:20] <icinga-wm>	 PROBLEM - IPsec on cp1048 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:21:20] <icinga-wm>	 PROBLEM - IPsec on kafka1018 is CRITICAL: Strongswan CRITICAL - ok: 142 connecting: cp2013_v6, cp2014_v6, cp2015_v6, cp2016_v6, cp2017_v6, cp2018_v6
[16:21:20] <icinga-wm>	 PROBLEM - IPsec on cp4014 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:21] <icinga-wm>	 PROBLEM - IPsec on cp3038 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:21] <icinga-wm>	 PROBLEM - IPsec on cp3036 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:30] <icinga-wm>	 PROBLEM - IPsec on cp1074 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:21:30] <icinga-wm>	 PROBLEM - IPsec on cp1061 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2018_v6
[16:21:30] <icinga-wm>	 PROBLEM - IPsec on cp1047 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2015_v6
[16:21:30] <icinga-wm>	 PROBLEM - IPsec on cp4003 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2018_v6
[16:21:30] <icinga-wm>	 PROBLEM - IPsec on cp3034 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:31] <icinga-wm>	 PROBLEM - IPsec on cp3006 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2015_v6
[16:21:31] <icinga-wm>	 PROBLEM - IPsec on cp3005 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2015_v6
[16:21:32] <icinga-wm>	 PROBLEM - IPsec on cp3037 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:32] <icinga-wm>	 PROBLEM - IPsec on cp3048 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:33] <icinga-wm>	 PROBLEM - IPsec on cp1064 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:21:33] <icinga-wm>	 PROBLEM - IPsec on cp2015 is CRITICAL: Strongswan CRITICAL - ok: 18 connecting: (unnamed), cp1046_v6, cp1047_v6, cp1059_v6, cp1060_v6, cp3003_v6, cp3004_v6, cp4011_v6, cp4012_v6,kafka1012_v6,kafka1014_v6,kafka1020_v6,kafka1022_v6 not-conn: cp3005_v6, cp3006_v6, cp4019_v6, cp4020_v6,kafka1013_v6,kafka1018_v6
[16:21:34] <icinga-wm>	 PROBLEM - IPsec on cp4006 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:34] <icinga-wm>	 PROBLEM - IPsec on cp4013 is CRITICAL: Strongswan CRITICAL - ok: 52 connecting: cp2014_v6, cp2017_v6
[16:21:35] <icinga-wm>	 PROBLEM - IPsec on cp3004 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2015_v6
[16:21:50] <icinga-wm>	 PROBLEM - IPsec on cp1046 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2015_v6
[16:21:51] <icinga-wm>	 PROBLEM - IPsec on cp1049 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:21:51] <icinga-wm>	 PROBLEM - IPsec on cp1099 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:21:51] <icinga-wm>	 PROBLEM - IPsec on kafka1022 is CRITICAL: Strongswan CRITICAL - ok: 142 connecting: cp2013_v6, cp2014_v6, cp2015_v6, cp2016_v6, cp2017_v6, cp2018_v6
[16:21:51] <icinga-wm>	 PROBLEM - IPsec on cp3009 is CRITICAL: Strongswan CRITICAL - ok: 27 connecting: cp2018_v6
[16:22:02] <icinga-wm>	 PROBLEM - IPsec on cp1045 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp2018_v6
[16:22:02] <icinga-wm>	 PROBLEM - IPsec on cp1050 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:22:02] <icinga-wm>	 PROBLEM - IPsec on cp1063 is CRITICAL: Strongswan CRITICAL - ok: 54 not-conn: cp2014_v6, cp2017_v6
[16:22:02] <icinga-wm>	 PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 42 not-conn: cp2013_v6, cp2016_v6
[16:22:14] <grrrit-wm>	 (03PS1) 10Gilles: Define Thumbor file size rlimit in firejail, not systemd [puppet] - 10https://gerrit.wikimedia.org/r/320216 (https://phabricator.wikimedia.org/T145878) 
[16:23:20] <grrrit-wm>	 (03CR) 10Gilles: "Verified on beta: gilles@deployment-imagescaler01:~$ systemctl status thumbor@8801" [puppet] - 10https://gerrit.wikimedia.org/r/320216 (https://phabricator.wikimedia.org/T145878) (owner: 10Gilles)
[16:24:30] <icinga-wm>	 RECOVERY - IPsec on cp1074 is OK: Strongswan OK - 56 ESP OK
[16:24:30] <icinga-wm>	 RECOVERY - IPsec on cp1047 is OK: Strongswan OK - 24 ESP OK
[16:24:30] <icinga-wm>	 RECOVERY - IPsec on cp1061 is OK: Strongswan OK - 24 ESP OK
[16:24:30] <icinga-wm>	 RECOVERY - IPsec on cp4003 is OK: Strongswan OK - 28 ESP OK
[16:24:30] <icinga-wm>	 RECOVERY - IPsec on cp3034 is OK: Strongswan OK - 54 ESP OK
[16:24:31] <icinga-wm>	 RECOVERY - IPsec on cp3006 is OK: Strongswan OK - 28 ESP OK
[16:24:31] <icinga-wm>	 RECOVERY - IPsec on cp3037 is OK: Strongswan OK - 54 ESP OK
[16:24:32] <icinga-wm>	 RECOVERY - IPsec on cp3005 is OK: Strongswan OK - 28 ESP OK
[16:24:32] <icinga-wm>	 RECOVERY - IPsec on cp3048 is OK: Strongswan OK - 54 ESP OK
[16:24:33] <icinga-wm>	 RECOVERY - IPsec on cp1064 is OK: Strongswan OK - 56 ESP OK
[16:24:33] <icinga-wm>	 RECOVERY - IPsec on cp2015 is OK: Strongswan OK - 36 ESP OK
[16:24:34] <icinga-wm>	 RECOVERY - IPsec on cp4013 is OK: Strongswan OK - 54 ESP OK
[16:24:34] <icinga-wm>	 RECOVERY - IPsec on cp4006 is OK: Strongswan OK - 54 ESP OK
[16:24:35] <icinga-wm>	 RECOVERY - IPsec on cp3004 is OK: Strongswan OK - 28 ESP OK
[16:24:50] <icinga-wm>	 RECOVERY - IPsec on cp1046 is OK: Strongswan OK - 24 ESP OK
[16:24:50] <icinga-wm>	 RECOVERY - IPsec on cp1055 is OK: Strongswan OK - 44 ESP OK
[16:24:50] <icinga-wm>	 RECOVERY - IPsec on cp1053 is OK: Strongswan OK - 44 ESP OK
[16:24:50] <icinga-wm>	 RECOVERY - IPsec on cp1049 is OK: Strongswan OK - 56 ESP OK
[16:24:50] <icinga-wm>	 RECOVERY - IPsec on cp3008 is OK: Strongswan OK - 28 ESP OK
[16:24:51] <icinga-wm>	 RECOVERY - IPsec on cp3045 is OK: Strongswan OK - 54 ESP OK
[16:24:51] <icinga-wm>	 RECOVERY - IPsec on cp1099 is OK: Strongswan OK - 56 ESP OK
[16:24:52] <icinga-wm>	 RECOVERY - IPsec on kafka1022 is OK: Strongswan OK - 148 ESP OK
[16:24:52] <icinga-wm>	 RECOVERY - IPsec on cp3009 is OK: Strongswan OK - 28 ESP OK
[16:24:53] <icinga-wm>	 RECOVERY - IPsec on cp1071 is OK: Strongswan OK - 56 ESP OK
[16:24:53] <icinga-wm>	 RECOVERY - IPsec on cp3035 is OK: Strongswan OK - 54 ESP OK
[16:25:00] <icinga-wm>	 RECOVERY - IPsec on cp1059 is OK: Strongswan OK - 24 ESP OK
[16:25:00] <icinga-wm>	 RECOVERY - IPsec on cp1045 is OK: Strongswan OK - 24 ESP OK
[16:25:00] <icinga-wm>	 RECOVERY - IPsec on cp1060 is OK: Strongswan OK - 24 ESP OK
[16:25:00] <icinga-wm>	 RECOVERY - IPsec on cp1054 is OK: Strongswan OK - 44 ESP OK
[16:25:00] <icinga-wm>	 RECOVERY - IPsec on cp3044 is OK: Strongswan OK - 54 ESP OK
[16:25:01] <icinga-wm>	 RECOVERY - IPsec on cp1050 is OK: Strongswan OK - 56 ESP OK
[16:25:01] <icinga-wm>	 RECOVERY - IPsec on cp1063 is OK: Strongswan OK - 56 ESP OK
[16:25:02] <icinga-wm>	 RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 44 ESP OK
[16:25:10] <icinga-wm>	 RECOVERY - IPsec on cp1062 is OK: Strongswan OK - 56 ESP OK
[16:25:10] <icinga-wm>	 RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 148 ESP OK
[16:25:10] <icinga-wm>	 RECOVERY - IPsec on cp2017 is OK: Strongswan OK - 70 ESP OK
[16:25:10] <icinga-wm>	 RECOVERY - IPsec on kafka1020 is OK: Strongswan OK - 148 ESP OK
[16:25:10] <icinga-wm>	 RECOVERY - IPsec on cp1051 is OK: Strongswan OK - 24 ESP OK
[16:25:11] <icinga-wm>	 RECOVERY - IPsec on cp4011 is OK: Strongswan OK - 28 ESP OK
[16:25:11] <icinga-wm>	 RECOVERY - IPsec on cp4020 is OK: Strongswan OK - 28 ESP OK
[16:25:12] <icinga-wm>	 RECOVERY - IPsec on cp4004 is OK: Strongswan OK - 28 ESP OK
[16:25:12] <icinga-wm>	 RECOVERY - IPsec on cp4001 is OK: Strongswan OK - 28 ESP OK
[16:25:13] <icinga-wm>	 RECOVERY - IPsec on cp4002 is OK: Strongswan OK - 28 ESP OK
[16:25:13] <icinga-wm>	 RECOVERY - IPsec on cp4012 is OK: Strongswan OK - 28 ESP OK
[16:25:14] <icinga-wm>	 RECOVERY - IPsec on cp4019 is OK: Strongswan OK - 28 ESP OK
[16:25:14] <icinga-wm>	 RECOVERY - IPsec on cp4007 is OK: Strongswan OK - 54 ESP OK
[16:25:20] <icinga-wm>	 RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 56 ESP OK
[16:25:20] <icinga-wm>	 RECOVERY - IPsec on cp2014 is OK: Strongswan OK - 70 ESP OK
[16:25:20] <icinga-wm>	 RECOVERY - IPsec on cp1048 is OK: Strongswan OK - 56 ESP OK
[16:25:20] <icinga-wm>	 RECOVERY - IPsec on kafka1018 is OK: Strongswan OK - 148 ESP OK
[16:25:20] <icinga-wm>	 RECOVERY - IPsec on cp4014 is OK: Strongswan OK - 54 ESP OK
[16:25:21] <icinga-wm>	 RECOVERY - IPsec on cp3038 is OK: Strongswan OK - 54 ESP OK
[16:25:21] <icinga-wm>	 RECOVERY - IPsec on cp3036 is OK: Strongswan OK - 54 ESP OK
[16:28:49] <grrrit-wm>	 (03CR) 10ArielGlenn: "Indeed these scripts now have bashisms in them. I am open to patchsets that would keep the functionality and make them sh compatible." [dumps] - 10https://gerrit.wikimedia.org/r/207694 (owner: 10Dereckson)
[16:29:17] <wikibugs>	 06Operations, 10netops: Migrate links from cr1-eqiad/cr2-eqiad fpc 5 to fpc 3 - https://phabricator.wikimedia.org/T149196#2776789 (10mark) a:03mark All ports have been moved off of FPC5 on both routers, and all configuration for FPC5 ports has been removed.  The Equinix Ashburn IXP port is still awaiting the...
[16:30:10] <icinga-wm>	 RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[16:30:21] <grrrit-wm>	 (03PS3) 10Mark Bergsma: Reflect new FPC3 ports after cr1-/cr2-eqiad FPC5 decommissioning [dns] - 10https://gerrit.wikimedia.org/r/319617 (https://phabricator.wikimedia.org/T149196) 
[16:34:08] <grrrit-wm>	 (03PS1) 10EBernhardson: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) 
[16:34:30] <arseny92>	 by the way, i guess T131385 should have been deployed months ago hashar ?
[16:34:30] <stashbot>	 T131385: Dynamically fiddle with wgLocalDatabases to recognise wikitech separation - https://phabricator.wikimedia.org/T131385
[16:35:23] <hashar>	 arseny92: I have no idea
[16:36:01] <arseny92>	 as you triaged normal and prod impact
[16:37:35] <arseny92>	 the patch looks good i guess
[16:42:33] <hashar>	 arseny92: I just quickly triaged that task on the wikimedia-log-errors board. I am not involved in having it fixed
[16:51:23] <wikibugs>	 06Operations, 06Discovery, 06Maps: Investigate how Kartotherian metrics are published and what they mean - https://phabricator.wikimedia.org/T149889#2776874 (10Gehel) A quick look at graphite1001 indicates that we already publish ~ 64k metrics for Kartotherian: ``` gehel@graphite1001:~$ find /var/lib/carbon/...
[16:55:10] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[16:56:10] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3077470 keys, up 7 days 8 hours - replication_delay is 0
[16:57:30] <ottomata>	 moritzm: am curious, and maybe you know, will kernel live patching come to debian sometime soon (does it exist already?)
[16:58:22] <grrrit-wm>	 (03PS1) 10Muehlenhoff: statistics::packages: Remove zpubsub [puppet] - 10https://gerrit.wikimedia.org/r/320227 (https://phabricator.wikimedia.org/T150003) 
[16:59:05] <moritzm>	 ottomata: no, not anytime soon. the generic in-kernel support isn't complete yet
[16:59:20] <ottomata>	 oh hm, thought i saw something saying it was.  aye cool
[17:00:00] <moritzm>	 Ubuntu offers some live patching service since a few weeks, but it's useless
[17:08:53] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2776958 (10MoritzMuehlenhoff) >>! In T150003#2776465, @Halfak wrote: > The myspell packages are still needed.  If some aren't available, we can replace them with the...
[17:15:08] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 04-1] Reflect new FPC3 ports after cr1-/cr2-eqiad FPC5 decommissioning (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/319617 (https://phabricator.wikimedia.org/T149196) (owner: 10Mark Bergsma)
[17:16:01] <wikibugs>	 06Operations, 10ops-eqiad, 10DBA, 06Labs, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2776977 (10mark) Approved.
[17:17:38] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "We can drop the 'zuul.eqiad.wmnet' DNS entry now. The last user was Nodepool and it is now pointing directly to contint1001.wikimedia.org." [dns] - 10https://gerrit.wikimedia.org/r/319675 (owner: 10Hashar)
[17:22:44] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2777038 (10Ottomata) Or, for now, we could include these from somewhere other than the class that will be included on the notebook hosts.  Or conditionally only inclu...
[17:23:33] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2777040 (10Gilles) >>! In T66214#2772934, @Tgr wrote: > Projects which can get away with being Wikimedia-only (such as the mobile apps) could just use...
[17:24:25] <wikibugs>	 06Operations, 10ops-eqiad, 10DBA, 06Labs, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2777041 (10jcrespo) a:05jcrespo>03None @RobH You mentioned it may not need a physical movement...
[17:28:05] <wikibugs>	 06Operations, 10ops-eqiad, 10DBA, 06Labs, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2777052 (10RobH) I said that in reply to the labs to db server transition, not the transition of t...
[17:29:51] <wikibugs>	 06Operations: eqiad: 1 hardware access request for labs on real hardware (mwoffliner) - https://phabricator.wikimedia.org/T117095#2777054 (10AlexMonk-WMF) >>! In T117095#2775967, @Kelson wrote: > We can not really do monthly snapshot (what would a good thing) and we have pretty serious difficulties to create ZIM...
[17:30:11] <wikibugs>	 06Operations, 10ops-eqiad, 10DBA, 06Labs, and 2 others: Move dbproxy1010 and dbproxy1011 to labs-support network, rename them to labsdbproxy1001 and labsdbproxy1002 - https://phabricator.wikimedia.org/T149170#2777055 (10jcrespo) Sorry for the missunderstanding!
[17:36:43] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2777063 (10MoritzMuehlenhoff) I've rebuilt all the dict packagesl (seven were imported from trusty and two from xenial since they failed to build on jessie), uploads...
[17:37:25] <andrewbogott>	 godog: can you please either enable puppet and updates on filippo-test-trusty or delete the instance?  It's been needing a lot of special attention lately :(
[17:37:52] <bd808>	 jouncebot: now
[17:37:52] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 22 minute(s)
[17:37:56] <bd808>	 jouncebot: next
[17:37:57] <jouncebot>	 In 0 hour(s) and 22 minute(s): Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T1800)
[17:38:29] <bd808>	 I've got a backport that will help with ELK logging issues that we are having.
[17:38:49] <bd808>	 greg-g: can I sneak this out? -- https://gerrit.wikimedia.org/r/#/c/320232/
[17:39:01] <godog>	 andrewbogott: yeah delete it if it causing problems
[17:39:10] <andrewbogott>	 godog: ok, thanks
[17:39:23] <greg-g>	 bd808: sneak away
[17:40:36] <jynus>	 !log performing schema change on s7 (imagelinks) T139090
[17:40:40] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Make Thumbor IM engine based on a subprocess - https://phabricator.wikimedia.org/T149903#2777087 (10Gilles) Might not be necessary after all if this works out: https://gerrit.wikimedia.org/r/#/c/319807/
[17:40:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:40:42] <stashbot>	 T139090: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090
[17:42:29] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2777093 (10MoritzMuehlenhoff) The packages were imported from trusty and built for jessie-wikimedia: dict-nr_20070206-4ubuntu1+wmf1_amd64.changes dict-ns_20070206-4ub...
[17:44:10] * bd808 twiddles thumbs while jenkins does its thing
[17:44:18] <grrrit-wm>	 (03CR) 10Ottomata: [C: 031] statistics::packages: Remove zpubsub [puppet] - 10https://gerrit.wikimedia.org/r/320227 (https://phabricator.wikimedia.org/T150003) (owner: 10Muehlenhoff)
[17:46:02] <Revent>	 brion: Ping?
[17:46:54] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032] statistics::packages: Remove zpubsub [puppet] - 10https://gerrit.wikimedia.org/r/320227 (https://phabricator.wikimedia.org/T150003) (owner: 10Muehlenhoff)
[17:55:58] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2777136 (10MoritzMuehlenhoff) So this is down to one package now: python-pygeoip seems to be an internal package. Is that still needed? Debian has python-geoip, maybe...
[17:56:26] <logmsgbot>	 !log bd808@tin Synchronized php-1.29.0-wmf.1/includes/exception/MWExceptionHandler.php: MWExceptionHandler: Do not use 'exception' for custom log data (T150106) (duration: 00m 47s)
[17:56:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:56:32] <stashbot>	 T150106: MediaWiki logs on the EventBus channel causing indexing failures in ELK Elasticsearch - https://phabricator.wikimedia.org/T150106
[17:57:55] <wikibugs>	 06Operations: update-ca-certificates, run via puppets sslcert module, doesn't update symlinks to replaced certificates - https://phabricator.wikimedia.org/T150058#2777154 (10AlexMonk-WMF) >>! In T150058#2775860, @akosiaris wrote: > @AlexMonk-WMF As @Joe said, we 've copied over the CA in production (2 months ago...
[17:58:41] <wikibugs>	 06Operations: update-ca-certificates, run via puppets sslcert module, doesn't update symlinks to replaced certificates - https://phabricator.wikimedia.org/T150058#2777157 (10akosiaris) >>! In T150058#2777154, @AlexMonk-WMF wrote: >>>! In T150058#2775860, @akosiaris wrote: >> @AlexMonk-WMF As @Joe said, we 've co...
[18:00:04] <jouncebot>	 gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T1800).
[18:00:06] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2777159 (10Halfak) >>! In T150003#2776958, @MoritzMuehlenhoff wrote: > None of these languages have an aspell or hunspell package in Debian. So, if these are in fact...
[18:00:34] <urandom>	 !log T133395: Convert local_group_*_title__revisions.{data,idx_by_rev_ever} tables to time-window compaction
[18:00:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:40] <stashbot>	 T133395: Evaluate TimeWindowCompactionStrategy - https://phabricator.wikimedia.org/T133395
[18:03:16] <gehel>	 SMalyshev: latest wdqs deployed on beta, seems not working, I'm having a look
[18:03:43] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2777166 (10MoritzMuehlenhoff) https://github.com/wikimedia/operations-puppet/blob/production/modules/ores/manifests/base.pp#L20 is only a subset of dictionaries, the...
[18:03:49] <grrrit-wm>	 (03PS4) 10Elukey: First Docker prototype [software/varnish/varnishkafka/testing] - 10https://gerrit.wikimedia.org/r/319548 (https://phabricator.wikimedia.org/T147442) 
[18:06:46] <grrrit-wm>	 (03PS2) 10EBernhardson: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) 
[18:07:31] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson)
[18:07:34] <grrrit-wm>	 (03PS3) 10EBernhardson: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) 
[18:08:03] <grrrit-wm>	 (03CR) 10DCausse: Setup CirrusSearch interwiki load test (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson)
[18:09:30] <grrrit-wm>	 (03PS3) 10Dzahn: confluent::kafka::mirror::jmxtrans: key attr is declared more than once [puppet] - 10https://gerrit.wikimedia.org/r/319770 
[18:09:42] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] confluent::kafka::mirror::jmxtrans: key attr is declared more than once [puppet] - 10https://gerrit.wikimedia.org/r/319770 (owner: 10Dzahn)
[18:09:49] <paladox>	 grrrit-wm: restart
[18:09:51] <grrrit-wm>	 re-connecting to gerrit
[18:09:52] <grrrit-wm>	 reconnected to gerrit
[18:12:53] <grrrit-wm>	 (03PS4) 10EBernhardson: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) 
[18:14:22] <grrrit-wm>	 (03PS5) 10Elukey: First Docker prototype [software/varnish/varnishkafka/testing] - 10https://gerrit.wikimedia.org/r/319548 (https://phabricator.wikimedia.org/T147442) 
[18:15:16] <grrrit-wm>	 (03CR) 10EBernhardson: Setup CirrusSearch interwiki load test (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson)
[18:15:54] <grrrit-wm>	 (03PS2) 10Dzahn: Add projectcom.wikimedia.org to Apache [puppet] - 10https://gerrit.wikimedia.org/r/319123 (https://phabricator.wikimedia.org/T143138) (owner: 10Dereckson)
[18:20:46] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] Add projectcom.wikimedia.org to Apache [puppet] - 10https://gerrit.wikimedia.org/r/319123 (https://phabricator.wikimedia.org/T143138) (owner: 10Dereckson)
[18:28:38] <wikibugs>	 06Operations, 10Citoid, 10VisualEditor, 06Services (blocked): Separate citoid service for beta that runs off master instead of deploy - https://phabricator.wikimedia.org/T92304#2777252 (10Mvolz) >>! In T92304#2767678, @mobrovac wrote: > Nope, @Mvolz that's not it. I will likely delete that instance as it's...
[18:28:48] <wikibugs>	 06Operations, 10Citoid, 10VisualEditor, 06Services (blocked): Separate citoid service for beta that runs off master instead of deploy - https://phabricator.wikimedia.org/T92304#2777253 (10Mvolz) >>! In T92304#2767678, @mobrovac wrote: > Nope, @Mvolz that's not it. I will likely delete that instance as it's...
[18:36:58] <wikibugs>	 06Operations, 06Discovery, 10Elasticsearch, 10hardware-requests, 06Discovery-Search (Current work): elasticsearch new servers (5x eqiad / 12x codfw) - https://phabricator.wikimedia.org/T149089#2777285 (10Gehel) a:03RobH
[18:41:01] <wikibugs>	 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Interactive-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2777320 (10grin) >>! In T141815#2773589, @Gehel wrote: > @debt: as @BBlack pointed out in the start of this thread, we tend to have a fairly liberal view on who can r...
[18:51:29] <wikibugs>	 07Puppet, 06Labs: Puppet parser, puppet API, and inline docs - https://phabricator.wikimedia.org/T148479#2777347 (10Andrew)
[18:55:39] <wikibugs>	 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Interactive-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2513156 (10Deskana) I think this task is conflating two separate issues. - Define //technical// tile usage policy (i.e. what users can do without melting the servers...
[19:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T1900). Please do the needful.
[19:06:05] <grrrit-wm>	 (03CR) 10DCausse: [C: 031] Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson)
[19:08:13] <wikibugs>	 06Operations, 07Puppet, 13Patch-For-Review, 07RfC: RFC: New puppet code organization paradigm/coding standards - https://phabricator.wikimedia.org/T147718#2777401 (10Andrew) @_joe_, can you please amend (or provide a new version) of the RFC incorporating the agreements here?  I've re-read this thread but a...
[19:11:30] <grrrit-wm>	 (03PS1) 10Dzahn: base: also install freeipmi on trusty hosts [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) 
[19:12:43] <grrrit-wm>	 (03PS2) 10Dzahn: base: also install freeipmi on trusty hosts [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) 
[19:13:58] <wikibugs>	 06Operations, 13Patch-For-Review: Not all packages from packages::statistics are available on jessie - https://phabricator.wikimedia.org/T150003#2777460 (10Ottomata) Hm, I don't really know why we need python-pygeoip.  It seems like python-geoip should be sufficient, but likely the API is different and pygeoip...
[19:17:54] <wikibugs>	 06Operations, 06Discovery, 06Maps, 06WMF-Legal, 03Interactive-Sprint: Define tile usage policy - https://phabricator.wikimedia.org/T141815#2777484 (10Gehel) @Deskana as always, you have the word of wisdom!  In any case, this is a discussion that will take some time, and that will need to be rolled out pr...
[19:21:20] <wikibugs>	 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2777511 (10DarTar) @Ottomata @Nuria: anything else you need from us to get this request processed? Please let us know.
[19:21:22] <godog>	 ottomata: re: jmx_exporter and kafka mirror maker, the easiest I think would be to run it as a 'java agent', e.g. https://github.com/prometheus/jmx_exporter#building-and-running
[19:22:57] <ottomata>	 yeah saw that, we'd have to either patch the confluent deb package, since it provides a CLI wrapper for launching the mirror maker JVM...or just bypass that and run our own java command to launch the process in our systemd unit
[19:23:15] <ottomata>	 java agent sounds nicer, since then the service runs with the jvm
[19:23:27] <ottomata>	 fewer services to manage/monitor
[19:24:20] <wikibugs>	 06Operations, 10Analytics, 10Analytics-Cluster, 06Research-and-Data, and 2 others: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#2777525 (10Nuria) @DarTar: nothing else is needed, ops will get to it as they have the bandwidth
[19:25:57] <godog>	 ottomata: yeah, should be easy to test too, curl the port passed on the command line and that will return the metrics
[19:26:32] <wikibugs>	 06Operations, 10puppet-compiler: puppet compiler claims "no change" when catalogs are actually different - https://phabricator.wikimedia.org/T149432#2777560 (10Volans) I can confirm the bug, example here: https://puppet-compiler.wmflabs.org/4560/analytics1001.eqiad.wmnet/  Partial diff: ``` 25284a25317,25332 >...
[19:26:49] <ottomata>	 godog:  and, if we get this data into prometheus, we will still be able to work with it in grafana, yes?
[19:26:49] <godog>	 ottomata: and the next step is to tell prometheus about the hosts that run mirrormaker and on what port
[19:26:56] <ottomata>	 aye
[19:27:23] <godog>	 ottomata: yeah grafana has both graphite and prometheus in each datacenter as sources
[19:27:37] <grrrit-wm>	 (03PS4) 10Gehel: Maps - tilerator on all maps servers needs access to postgresql master [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) 
[19:28:24] <grrrit-wm>	 (03CR) 10Volans: [C: 031] "Change LGTM and if we have freeipmi on jessie's hosts I don't see why not on trusty too, so the monitoring effort could be applied to both" [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) (owner: 10Dzahn)
[19:29:14] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Maps - tilerator on all maps servers needs access to postgresql master [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) (owner: 10Gehel)
[19:29:30] <icinga-wm>	 RECOVERY - Host lvs1008 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms
[19:29:40] <icinga-wm>	 RECOVERY - Host lvs1009 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[19:29:40] <icinga-wm>	 RECOVERY - configured eth on lvs1008 is OK: OK - interfaces up
[19:29:50] <grrrit-wm>	 (03PS5) 10Gehel: Maps - tilerator on all maps servers needs access to postgresql master [puppet] - 10https://gerrit.wikimedia.org/r/319893 (https://phabricator.wikimedia.org/T147223) 
[19:29:50] <icinga-wm>	 RECOVERY - configured eth on lvs1009 is OK: OK - interfaces up
[19:33:10] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[19:33:30] <icinga-wm>	 PROBLEM - Host lvs1011 is DOWN: PING CRITICAL - Packet loss = 100%
[19:34:00] <icinga-wm>	 PROBLEM - Host lvs1010 is DOWN: PING CRITICAL - Packet loss = 100%
[19:34:10] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3078048 keys, up 7 days 11 hours - replication_delay is 0
[19:34:10] <icinga-wm>	 PROBLEM - Host lvs1012 is DOWN: PING CRITICAL - Packet loss = 100%
[19:34:20] <icinga-wm>	 RECOVERY - Host lvs1011 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms
[19:34:30] <icinga-wm>	 RECOVERY - Host lvs1012 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms
[19:34:40] <icinga-wm>	 PROBLEM - Host lvs1009 is DOWN: PING CRITICAL - Packet loss = 100%
[19:34:40] <icinga-wm>	 PROBLEM - Host lvs1008 is DOWN: PING CRITICAL - Packet loss = 100%
[19:38:30] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[19:40:27] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting access to fluorine for matanya - https://phabricator.wikimedia.org/T149832#2777662 (10Matanya)
[19:42:41] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting access to fluorine for matanya - https://phabricator.wikimedia.org/T149832#2777681 (10Matanya) Public key:  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDaREf06EZa5CoRLc7sUNqZqDJVIKIxaxpsMsiG98QixtEEGQLw6W0CCwRwls+NVunzERumXJsKaAPBX5ptn5wH4ZXMfrD1y9CWqQFYkImKI4BJjCCef7x...
[19:44:30] <icinga-wm>	 RECOVERY - Host lvs1008 is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms
[19:45:08] <matanya>	 robh: can you please verify i filled this^^ request correctly ?
[19:45:44] <godog>	 is the lvs bouncing expected? 
[19:46:28] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "tested on mw1017" [puppet] - 10https://gerrit.wikimedia.org/r/319123 (https://phabricator.wikimedia.org/T143138) (owner: 10Dereckson)
[19:46:30] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[19:47:52] <robh>	 matanya: checking
[19:47:59] <matanya>	 thanks
[19:48:30] <wikibugs>	 06Operations, 10Wikimedia-General-or-Unknown, 10hardware-requests: Extend capacity for video scalers - https://phabricator.wikimedia.org/T150067#2777690 (10Reedy)
[19:49:44] <robh>	 matanya: these are mediawiki logs you want right?
[19:49:52] <robh>	 so group   mw-log-readers: 
[19:49:52] <matanya>	 yes robh
[19:50:08] <robh>	 and your wikitech username is same as your wanted username and irc name right?
[19:50:10] <icinga-wm>	 PROBLEM - check_redis on payments1002 is CRITICAL: CRITICAL ERROR - Can not connect to 127.0.0.1 on port 6379
[19:50:13] <robh>	 we base your uid off that
[19:50:52] <wikibugs>	 06Operations, 10Wikimedia-General-or-Unknown, 10hardware-requests: Extend capacity for video scalers - https://phabricator.wikimedia.org/T150067#2773188 (10Matanya) probably adding a GPU might be wise as well.
[19:51:02] <matanya>	 yes robh
[19:51:22] <robh>	 yeah i'll get this working for you shortly
[19:51:26] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Investigate why oom_kill mtail program doesn't work properly - https://phabricator.wikimedia.org/T149980#2777696 (10fgiunchedi) Indeed that's odd, looks like mtail stops pushing metrics to graphite. Now lithium is running with a mtail version from upstream plus my...
[19:51:29] <robh>	 its past the three days =]
[19:51:30] <matanya>	 thanks!
[19:51:40] <wikibugs>	 06Operations, 10Continuous-Integration-Infrastructure: On Trusty and Jessie PHP yields: PHP Deprecated:  Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/20-xhprof.ini on line 2  - https://phabricator.wikimedia.org/T135338#2777697 (10hashar) 05declined>03Open
[19:52:15] <grrrit-wm>	 (03PS3) 10Dereckson: add projectcom.wikimedia.org for new private wiki [dns] - 10https://gerrit.wikimedia.org/r/305120 (https://phabricator.wikimedia.org/T143138) (owner: 10Dzahn)
[19:52:54] <wikibugs>	 06Operations, 10Continuous-Integration-Infrastructure: On Trusty and Jessie PHP yields: PHP Deprecated:  Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/20-xhprof.ini on line 2 - https://phabricator.wikimedia.org/T135338#2295773 (10hashar) I have reopened the task, the notice is quite annoyin...
[19:53:27] <Dereckson>	 Hi mutante. Could you also take care of the DNS change please? If so, I can setup it as the same time than ec.wikimedia.org this evening.
[19:53:30] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[19:55:07] <grrrit-wm>	 (03PS1) 10RobH: granting matanya shell access [puppet] - 10https://gerrit.wikimedia.org/r/320256 (https://phabricator.wikimedia.org/T149832) 
[19:55:10] <icinga-wm>	 PROBLEM - check_redis on payments1002 is CRITICAL: CRITICAL ERROR - Can not connect to 127.0.0.1 on port 6379
[19:55:56] <grrrit-wm>	 (03CR) 10RobH: [C: 032] granting matanya shell access [puppet] - 10https://gerrit.wikimedia.org/r/320256 (https://phabricator.wikimedia.org/T149832) (owner: 10RobH)
[19:56:24] <Jeff_Green>	 grrr, check_redis on payments* is not a real problem, it's just taking forever for icinga to pick up config changes
[19:58:38] <mutante>	 Dereckson: i am aware of that and was waiting for half an hour until puppet ran across the fleet. last time we added to DNS first, clicked the URL, and then got a cached errror page in varnish
[19:58:45] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting access to fluorine for matanya - https://phabricator.wikimedia.org/T149832#2777721 (10RobH) 05Open>03Resolved a:03RobH Access granted and its now merged live on fluorine.  Depending on which bastion server you access via, it may take up to 30 minutes for th...
[19:58:52] <robh>	 matanya: access is live, what bastion would you acces via?
[19:59:01] <matanya>	 3001
[19:59:01] <robh>	 im happy to manually kick puppet to run immediately on it, already did for fluorine
[19:59:12] <matanya>	 many many thanks
[19:59:14] <robh>	 running
[19:59:45] <mutante>	 Dereckson: what time is your slot?
[19:59:59] <mutante>	 i saw you are also doing "ec".wm at the same time? cool!
[20:00:10] <icinga-wm>	 PROBLEM - check_redis on payments1002 is CRITICAL: CRITICAL ERROR - Can not connect to 127.0.0.1 on port 6379
[20:00:55] <robh>	 matanya: ok, access is setup on bast3001+fluorine
[20:01:00] <robh>	 the other bastions will get them when they auto call
[20:01:02] <Dereckson>	 mutante: before the evening SWAT, 21-23:00 UTC
[20:01:03] <robh>	 you should be all set
[20:01:24] <matanya>	 thanks robh ! what is the dns name for fluorine ? wmnet ?
[20:01:30] <robh>	 eqiad.wmnet
[20:01:32] <robh>	 yep
[20:03:14] <mutante>	 Dereckson: i'll get it done in time, in a couple minutes
[20:05:10] <icinga-wm>	 PROBLEM - check_redis on payments1002 is CRITICAL: CRITICAL ERROR - Can not connect to 127.0.0.1 on port 6379
[20:05:16] <grrrit-wm>	 (03PS1) 10BBlack: maps VCL: clean up frontend file [puppet] - 10https://gerrit.wikimedia.org/r/320257 
[20:05:18] <grrrit-wm>	 (03PS1) 10BBlack: VCL: add backend_response_early hooks [puppet] - 10https://gerrit.wikimedia.org/r/320258 (https://phabricator.wikimedia.org/T131503) 
[20:05:20] <grrrit-wm>	 (03PS1) 10BBlack: Text VCL: fixup beresp.Cookie for Vary before hfp [puppet] - 10https://gerrit.wikimedia.org/r/320259 (https://phabricator.wikimedia.org/T131503) 
[20:05:22] <grrrit-wm>	 (03PS1) 10BBlack: Text VCL: avoid creating empty Cookie header [puppet] - 10https://gerrit.wikimedia.org/r/320260 (https://phabricator.wikimedia.org/T131503) 
[20:05:33] <matanya>	 i am in robh, closing the ticket as resolved, again, thank you, this will save me a lot of time (or use a lot of my time, depends on your view;))
[20:05:39] <Dereckson>	 mutante: thanks
[20:05:45] <grrrit-wm>	 (03PS2) 10Dzahn: Remove zuul.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/319675 (owner: 10Hashar)
[20:06:02] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Remove zuul.eqiad.wmnet [dns] - 10https://gerrit.wikimedia.org/r/319675 (owner: 10Hashar)
[20:06:03] <matanya>	 oh, you already closed it
[20:06:24] <grrrit-wm>	 (03Abandoned) 10Hashar: zuul: stop managing unix user/group [puppet] - 10https://gerrit.wikimedia.org/r/315902 (owner: 10Hashar)
[20:07:09] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Wikistatus: Delete pages for deleted instances. [puppet] - 10https://gerrit.wikimedia.org/r/320261 (https://phabricator.wikimedia.org/T140298) 
[20:09:01] <Dereckson>	 jouncebot: refresh
[20:09:05] <jouncebot>	 I refreshed my knowledge about deployments.
[20:09:10] <icinga-wm>	 PROBLEM - Host lvs1008 is DOWN: PING CRITICAL - Packet loss = 100%
[20:09:52] <mutante>	 the host is up
[20:10:07] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Wikistatus: Delete pages for deleted instances. [puppet] - 10https://gerrit.wikimedia.org/r/320261 (https://phabricator.wikimedia.org/T140298) 
[20:10:48] <grrrit-wm>	 (03PS2) 10BBlack: Text VCL: avoid creating empty Cookie header [puppet] - 10https://gerrit.wikimedia.org/r/320260 (https://phabricator.wikimedia.org/T131503) 
[20:10:51] <grrrit-wm>	 (03PS2) 10BBlack: VCL: add backend_response_early hooks [puppet] - 10https://gerrit.wikimedia.org/r/320258 (https://phabricator.wikimedia.org/T131503) 
[20:10:53] <grrrit-wm>	 (03PS2) 10BBlack: Text VCL: fixup beresp.Cookie for Vary before hfp [puppet] - 10https://gerrit.wikimedia.org/r/320259 (https://phabricator.wikimedia.org/T131503) 
[20:12:12] <godog>	 mutante: yeah one nic on e.g. lvs1008 flapped though, maybe related to some network or physical work cmjohnson1 mark ?
[20:12:20] <icinga-wm>	 RECOVERY - Host lvs1010 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms
[20:12:39] <grrrit-wm>	 (03Abandoned) 10Dereckson: More UNIX agnostic, less GNU/Linux-centric scripts [dumps] - 10https://gerrit.wikimedia.org/r/207694 (owner: 10Dereckson)
[20:12:41] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Wikistatus: Delete pages for deleted instances. [puppet] - 10https://gerrit.wikimedia.org/r/320261 (https://phabricator.wikimedia.org/T140298) (owner: 10Andrew Bogott)
[20:13:18] <cmjohnson1>	 godog: yep....it's lvs to row D
[20:13:33] <cmjohnson1>	 working on it right now
[20:13:49] <mutante>	 thanks
[20:14:16] <godog>	 cmjohnson1: ack, thanks
[20:14:31] <godog>	 !log cmjohnson1 is performing work on LVS in row D, there might be flaps
[20:14:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:07] <grrrit-wm>	 (03PS3) 10BBlack: Text VCL: avoid creating empty Cookie header [puppet] - 10https://gerrit.wikimedia.org/r/320260 (https://phabricator.wikimedia.org/T131503) 
[20:15:09] <grrrit-wm>	 (03PS2) 10BBlack: maps VCL: clean up frontend file [puppet] - 10https://gerrit.wikimedia.org/r/320257 
[20:15:11] <grrrit-wm>	 (03PS3) 10BBlack: VCL: add backend_response_early hooks [puppet] - 10https://gerrit.wikimedia.org/r/320258 (https://phabricator.wikimedia.org/T131503) 
[20:15:13] <grrrit-wm>	 (03PS3) 10BBlack: Text VCL: fixup beresp.Cookie for Vary before hfp [puppet] - 10https://gerrit.wikimedia.org/r/320259 (https://phabricator.wikimedia.org/T131503) 
[20:15:30] <mutante>	 jouncebot: next
[20:15:30] <jouncebot>	 In 0 hour(s) and 44 minute(s): Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T2100)
[20:15:30] <jouncebot>	 In 0 hour(s) and 44 minute(s): Add wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T2100)
[20:15:30] <icinga-wm>	 PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:15:35] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] maps VCL: clean up frontend file [puppet] - 10https://gerrit.wikimedia.org/r/320257 (owner: 10BBlack)
[20:15:56] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: Define Thumbor file size rlimit in firejail, not systemd [puppet] - 10https://gerrit.wikimedia.org/r/320216 (https://phabricator.wikimedia.org/T145878) (owner: 10Gilles)
[20:16:10] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] add projectcom.wikimedia.org for new private wiki [dns] - 10https://gerrit.wikimedia.org/r/305120 (https://phabricator.wikimedia.org/T143138) (owner: 10Dzahn)
[20:16:14] <grrrit-wm>	 (03PS4) 10Dzahn: add projectcom.wikimedia.org for new private wiki [dns] - 10https://gerrit.wikimedia.org/r/305120 (https://phabricator.wikimedia.org/T143138) 
[20:16:22] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] VCL: add backend_response_early hooks [puppet] - 10https://gerrit.wikimedia.org/r/320258 (https://phabricator.wikimedia.org/T131503) (owner: 10BBlack)
[20:17:01] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] Text VCL: fixup beresp.Cookie for Vary before hfp [puppet] - 10https://gerrit.wikimedia.org/r/320259 (https://phabricator.wikimedia.org/T131503) (owner: 10BBlack)
[20:17:26] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] Text VCL: avoid creating empty Cookie header [puppet] - 10https://gerrit.wikimedia.org/r/320260 (https://phabricator.wikimedia.org/T131503) (owner: 10BBlack)
[20:21:12] <mutante>	 !log projectcom.wikimedia.org created in DNS (T143138)
[20:21:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:19] <stashbot>	 T143138: Private wiki for Project Grants Committee - https://phabricator.wikimedia.org/T143138
[20:22:23] <grrrit-wm>	 (03PS1) 10Reedy: Set $wgOATHAuthAccountPrefix to 'Wikimedia' for WMF CA wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320266 
[20:23:39] <grrrit-wm>	 (03CR) 10Dzahn: "maybe could you -1 this until the dependencies are merged and then change it to +1 when ready to go?" [puppet] - 10https://gerrit.wikimedia.org/r/319892 (https://phabricator.wikimedia.org/T150029) (owner: 10Reedy)
[20:24:32] <grrrit-wm>	 (03CR) 10Dzahn: "adding Ariel because it's (kind of) related to the GC investigation" [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox)
[20:25:31] <wikibugs>	 06Operations, 10ops-eqiad: ms-be1016 controller cache failure - https://phabricator.wikimedia.org/T150206#2777819 (10fgiunchedi)
[20:26:05] <grrrit-wm>	 (03CR) 10Dzahn: "@paladox: is it intentional that this happens only on jessie?" [puppet] - 10https://gerrit.wikimedia.org/r/316228 (https://phabricator.wikimedia.org/T39602) (owner: 10Paladox)
[20:26:27] <Reedy>	 Dereckson: If you're creating projectcom tonight... don't forget to add OATHAuth tables too
[20:26:30] <icinga-wm>	 RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[20:26:41] <Dereckson>	 Reedy: ok
[20:26:46] <grrrit-wm>	 (03CR) 10Dzahn: "your commit message says "in php.pp" but that is not the case anymore" [puppet] - 10https://gerrit.wikimedia.org/r/316228 (https://phabricator.wikimedia.org/T39602) (owner: 10Paladox)
[20:28:23] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "has this already been done in another patch? i think so, right? would need manual rebase at least" [puppet] - 10https://gerrit.wikimedia.org/r/316622 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox)
[20:28:43] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032] Define Thumbor file size rlimit in firejail, not systemd [puppet] - 10https://gerrit.wikimedia.org/r/320216 (https://phabricator.wikimedia.org/T145878) (owner: 10Gilles)
[20:28:48] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: Define Thumbor file size rlimit in firejail, not systemd [puppet] - 10https://gerrit.wikimedia.org/r/320216 (https://phabricator.wikimedia.org/T145878) (owner: 10Gilles)
[20:32:08] <bblack>	 !log repooling cp4018 (done experimenting)
[20:32:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:35:50] <grrrit-wm>	 (03PS5) 10Hashar: Add puppet-lint to Rakefile / Gemfile [puppet] - 10https://gerrit.wikimedia.org/r/288620 
[20:35:52] <grrrit-wm>	 (03PS5) 10Hashar: Only run puppet-lint against HEAD by default [puppet] - 10https://gerrit.wikimedia.org/r/288629 
[20:37:40] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Only run puppet-lint against HEAD by default [puppet] - 10https://gerrit.wikimedia.org/r/288629 (owner: 10Hashar)
[20:37:45] <hashar>	 oh my god
[20:38:33] <jynus>	 !log upgrading new labsdbs to mariadb 10.1.19
[20:38:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:55] <godog>	 jynus: \o/
[20:40:30] <icinga-wm>	 RECOVERY - Host lvs1008 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms
[20:40:40] <icinga-wm>	 RECOVERY - Host lvs1009 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[20:41:25] <bblack>	 your puppetlint fix failed puppetlint? :)
[20:43:14] <mutante>	 heh. yea.   2>&1:fatal: ambiguous argument 'HEAD^': unknown revision or path not in the working tree.
[20:44:29] <urandom>	 !log T133395: restbase2001-b.codfw.wmnet: Performing user-defined compaction of la-169239-big-Data.db and la-172629-big-Data.db
[20:44:30] <icinga-wm>	 RECOVERY - configured eth on lvs1007 is OK: OK - interfaces up
[20:44:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:44:36] <stashbot>	 T133395: Evaluate TimeWindowCompactionStrategy - https://phabricator.wikimedia.org/T133395
[20:44:40] <icinga-wm>	 RECOVERY - Host lvs1007 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms
[20:45:30] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1009 is OK: PYBAL OK - All pools are healthy
[20:53:37] <jynus>	 I am checking labsdb1009 bios- the boot order is right, but it keeps trying to boot from network
[20:55:51] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Avoid thumbor generating log files > 1GB - https://phabricator.wikimedia.org/T150208#2777923 (10Gilles)
[21:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, Amir1, and yurik: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T2100). Please do the needful.
[21:00:04] <jouncebot>	 Dereckson: Respected human, time to deploy Add wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T2100). Please do the needful.
[21:00:29] <Amir1>	 I have a rather fast deployment of ORES
[21:01:40] <Amir1>	 Okay, no objections?
[21:03:37] <bearND>	 no objection from me, since I'm still prepping deployment for mobileapps
[21:04:30] <icinga-wm>	 PROBLEM - Host google is DOWN: PING CRITICAL - Packet loss = 100%
[21:04:40] <bblack>	 oh no, google is dead! :)
[21:04:48] <bblack>	 :P
[21:04:49] <bearND>	 ohohhh
[21:04:50] <icinga-wm>	 RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 22.23 ms
[21:06:10] <mutante>	 racadm serveraction google up
[21:07:14] <grrrit-wm>	 (03PS1) 10Gilles: Rotate Thumbor 404 log by size, not date [puppet] - 10https://gerrit.wikimedia.org/r/320273 (https://phabricator.wikimedia.org/T150208) 
[21:08:43] <grrrit-wm>	 (03CR) 10Chad: [C: 04-1] "Yes, this was already done. If we need extra options for the JVM, they should be added to the array in jetty.pp, not appended here." [puppet] - 10https://gerrit.wikimedia.org/r/316622 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox)
[21:08:53] <Amir1>	 !log deploying c61b9c1 from ORES into canary nodes (T149730)
[21:09:08] <grrrit-wm>	 (03CR) 10Chad: "This is actually probably fine, but will require a gerrit restart." [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox)
[21:09:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:13] <stashbot>	 T149730: Deploy logging changes to ORES - https://phabricator.wikimedia.org/T149730
[21:12:26] <Amir1>	 !log deploying c61b9c1 from ORES to all nodes (T149730)
[21:12:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:18:04] <wikibugs>	 06Operations, 10ops-eqiad, 10DBA: labsdb1009 boot issues (power supply and controller?) - https://phabricator.wikimedia.org/T150211#2778014 (10jcrespo)
[21:18:04] <arlolra>	 !log starting Parsoid deploy
[21:18:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:19:09] <grrrit-wm>	 (03PS1) 10Madhuvishy: labstore: Add minute to crontab for backups to secondary DC [puppet] - 10https://gerrit.wikimedia.org/r/320276 
[21:20:42] <grrrit-wm>	 (03CR) 10Madhuvishy: [C: 032] labstore: Add minute to crontab for backups to secondary DC [puppet] - 10https://gerrit.wikimedia.org/r/320276 (owner: 10Madhuvishy)
[21:21:10] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 659 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3079377 keys, up 7 days 12 hours - replication_delay is 659
[21:22:10] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3077976 keys, up 7 days 13 hours - replication_delay is 0
[21:29:03] <grrrit-wm>	 (03PS2) 10Dereckson: Initial configuration for ec.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314471 (https://phabricator.wikimedia.org/T135521) 
[21:29:24] <arlolra>	 !log updated Parsoid to version 2c2fe425
[21:29:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:12] <bearND>	 !log starting mobileapps deploy
[21:30:15] <grrrit-wm>	 (03PS3) 10Dereckson: Initial configuration for ec.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314471 (https://phabricator.wikimedia.org/T135521) 
[21:30:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:45] <grrrit-wm>	 (03CR) 10Dereckson: "PS2: rebased (solved merge conflict against ptwikimedia logo). PS3: +wikiversions.json" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314471 (https://phabricator.wikimedia.org/T135521) (owner: 10Dereckson)
[21:32:17] <Amir1>	 okay, the ores deployment is done
[21:35:11] <bearND>	 Amir1: have you logged it in SAL?
[21:35:26] <Amir1>	 bearND: Yup at the top
[21:35:37] <Amir1>	 I log the start not the finish, I hope that's okay
[21:36:12] <bearND>	 Amir1: People usually log the start and the end
[21:36:34] <Amir1>	 okay, I do it 
[21:36:38] <bearND>	 to better correlate with any potential icinga alerts, i think
[21:37:07] <Amir1>	 !log ores deployment c61b9c1 is done
[21:37:28] <Amir1>	 bearND: Sure, thanks
[21:37:33] <bearND>	 👍
[21:37:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:38:22] <bearND>	 !log deployed mobileapps 4202cbb
[21:38:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:30] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[21:50:42] <gehel>	 !log deploying latest wdqs gui and blazegraph
[21:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:53:28] <gehel>	 SMalyshev: wdqs deployment completed, tests are green...
[21:56:34] <grrrit-wm>	 (03PS1) 10Dereckson: Initial configuration for projectcom.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320298 (https://phabricator.wikimedia.org/T143138) 
[21:57:35] <Dereckson>	 You're all done for services deployment?
[21:58:33] <grrrit-wm>	 (03PS2) 10Dzahn: remove gallium from site.pp, installserver [puppet] - 10https://gerrit.wikimedia.org/r/318216 (https://phabricator.wikimedia.org/T95757) 
[21:59:50] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] Initial configuration for ec.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314471 (https://phabricator.wikimedia.org/T135521) (owner: 10Dereckson)
[22:00:04] <jouncebot>	 dapatrick, bawolff, and Reedy: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161107T2200). Please do the needful.
[22:00:28] <grrrit-wm>	 (03Merged) 10jenkins-bot: Initial configuration for ec.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314471 (https://phabricator.wikimedia.org/T135521) (owner: 10Dereckson)
[22:03:40] <Dereckson>	 !log Starting ec.wikimedia.org wiki creation
[22:03:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:07:54] <Reedy>	 Dereckson: Let me know when you're done... Couple of patches to deploy for security
[22:08:05] * Dereckson nods
[22:08:23] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] remove gallium from site.pp, installserver [puppet] - 10https://gerrit.wikimedia.org/r/318216 (https://phabricator.wikimedia.org/T95757) (owner: 10Dzahn)
[22:09:41] <Reedy>	 make that 3 :)
[22:10:28] <mutante>	 !log gallium - revoke puppet cert, deactivate node
[22:10:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:12:09] <mutante>	 almost deleted all salt keys, thanks salt-key for asking [N/y]
[22:12:20] <Zppix>	 lol
[22:13:26] <mutante>	 -d  and -D
[22:14:30] <Krenair>	 Dereckson, any errors from the script?
[22:14:37] <Dereckson>	 Krenair: yes, but solved at pass 2
[22:14:50] <Dereckson>	 was Cirrus this time, at populate step
[22:15:11] <mutante>	 !log gallium - delete salt key, minion is stopped
[22:15:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:15:52] <Dereckson>	 so database okay, let's sync the config 
[22:17:32] <logmsgbot>	 !log dereckson@tin Synchronized dblists: (no message) (duration: 00m 53s)
[22:17:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:17:40] <icinga-wm>	 PROBLEM - salt-minion processes on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[22:17:59] <mutante>	 ^ yea, that should have been gone after the "node deactivate" step above
[22:18:01] <logmsgbot>	 !log dereckson@tin rebuilt wikiversions.php and synchronized wikiversions files: (no message)
[22:18:03] <mutante>	 hrmm
[22:18:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:18:34] <mutante>	 !log gallium - stopped apache, stopped salt, removed zuul cronjob
[22:18:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:19:10] <apergos>	 wow there goes gallium, yay!
[22:19:24] <Zppix>	 mutante don't touch my salt stash :P 
[22:19:45] <apergos>	 that's actually _my_ salt stash, if you wanna get picky about it :-P
[22:20:01] <mutante>	 heheh
[22:20:16] <apergos>	 anyways, a long time in coming, great to see it finishing up
[22:20:34] <mutante>	 :)
[22:20:39] <Zppix>	 apergos i would reply in a funny manner but unfortunaly theres a shit ton going on so i dont think that make some people happy with me 
[22:20:48] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: ec.wikimedia initial configuration (T135521) (duration: 00m 47s)
[22:20:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:20:53] <apergos>	 yep let's let em work
[22:20:54] <stashbot>	 T135521: Internal Wiki for Wikimedians of Ecuador - https://phabricator.wikimedia.org/T135521
[22:21:09] <grrrit-wm>	 (03CR) 10Krinkle: Rotate Thumbor 404 log by size, not date (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/320273 (https://phabricator.wikimedia.org/T150208) (owner: 10Gilles)
[22:21:40] <mutante>	 !icinga tell einsteinium and tegmen to remove gallium
[22:22:24] <logmsgbot>	 !log dereckson@tin Synchronized static/images/project-logos/: Logos for ec.wikimedia (T135521) (duration: 00m 48s)
[22:22:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:24:26] <wikibugs>	 06Operations, 10ops-codfw, 06DC-Ops, 13Patch-For-Review, 07Wikimedia-Incident: Labstore2001 controller or shelf failure - https://phabricator.wikimedia.org/T102626#2778227 (10madhuvishy) Labstore2001 is up and running with 12 internal disks connected via H700 raid controller, and 48 external disks across...
[22:25:44] <Dereckson>	 !log Created tables for OATHAuth on ec.wikimedia
[22:25:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:07] <Zppix>	 Dereckson what is this "ec.wikimedia"?
[22:26:18] <Dereckson>	 Wikimedistas de Ecuador
[22:27:30] <Dereckson>	 Reedy: I've ran addwiki (created the tables, sent the mail to the mailing list), sync'ed dblist, wikiversions, wmf-config/InitialiseSettings.php, but wiki doesn't appear at https://ec.wikimedia.org/
[22:28:19] <Dereckson>	 What are the conditions to serve a wiki there?
[22:28:44] <Dereckson>	 ecwikimedia exists, check / wikiversion check
[22:29:42] <Dereckson>	 mwrepl ecwikimedia → echo $wgServerName gives me well ec.wikimedia.org too
[22:30:51] <Dereckson>	 ah
[22:30:52] <Dereckson>	 For *wikimedia databases, add the subdomain to the list in MWMultiVersion::setSiteInfoForWiki
[22:31:07] <Zppix>	 Dereckson are DNS and stuff set up?
[22:31:11] <Dereckson>	 Zppix: yup
[22:31:22] <Krenair>	 Zppix, please leave Dereckson alone, he has important things to do
[22:31:34] <Zppix>	 Krenair im trying to help him out with why it wont show
[22:31:44] <mutante>	 Zppix: it's for Ecuador user group and yep, i added to DNS 
[22:32:43] <Reedy>	 You wouldn't get "no wiki found" with no dblist
[22:32:46] <Reedy>	 *DNS
[22:33:02] <grrrit-wm>	 (03PS1) 10Dereckson: Add ec.wikimedia to MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320302 (https://phabricator.wikimedia.org/T135521) 
[22:33:24] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Add ec.wikimedia to MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320302 (https://phabricator.wikimedia.org/T135521) (owner: 10Dereckson)
[22:33:25] <Zppix>	 Reedy, wow how'd i miss that part of the message 
[22:33:53] <mutante>	 "Please specify a valid Host header."
[22:34:01] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add ec.wikimedia to MWMultiVersion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320302 (https://phabricator.wikimedia.org/T135521) (owner: 10Dereckson)
[22:34:23] <Dereckson>	 Works on mw1099
[22:34:59] <Reedy>	 sync it out then :)
[22:35:32] <mutante>	 https://ec.wikimedia.org
[22:35:32] <mutante>	  * 200 OK 10518
[22:35:44] <logmsgbot>	 !log dereckson@tin Synchronized multiversion/MWMultiVersion.php: Add ec.wikimedia to MWMultiVersion (T135521) (duration: 00m 49s)
[22:35:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:35:50] <stashbot>	 T135521: Internal Wiki for Wikimedians of Ecuador - https://phabricator.wikimedia.org/T135521
[22:35:59] <Reedy>	 Magic
[22:36:11] <mutante>	 it's there :) nice
[22:36:25] <Dereckson>	 mutante: yes, for *.wikimedia.org, there is a need to declare the code in the script calling MediaWiki
[22:36:39] <apergos>	 ah login required. I see.  working at any rate
[22:36:58] <Dereckson>	 So, stewards can take care to fix login issues for private wikis of is something required there?
[22:37:51] <Reedy>	 https://ec.wikimedia.org/ doesn't seem to be redirecting to https://ec.wikimedia.org/wiki/P%C3%A1gina_principal
[22:38:01] <Reedy>	 But that's minor
[22:38:19] <Reedy>	 Dereckson: And no, for a private wiki, stewards can't help
[22:38:20] <apergos>	 it sends you to the special:login page
[22:38:30] <apergos>	 and that's expected, Reedy
[22:38:49] <Reedy>	 If you visit https://office.wikimedia.org
[22:39:07] <Reedy>	 it redirects you to https://office.wikimedia.org/wiki/Main_Page
[22:39:10] <mutante>	 !log Un nuevo wiki ha nacido. Bienvenido grupo de usuarios Ecuador Wikimedia. https://ec.wikimedia.org (T135521)
[22:39:14] <Reedy>	 But I guess that might've been whitelisted
[22:39:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:39:21] * mutante "tweets"
[22:39:22] <matanya>	 sorry Dereckson can't help
[22:39:31] <apergos>	 I am guessing so
[22:39:33] <Reedy>	 Dereckson: Need to find out who needs an account.. And create them one as 'crat
[22:39:48] <Reedy>	 using createAndPromote.php
[22:39:56] <Reedy>	 But that's minor and can be done by any shell user
[22:40:30] <Dereckson>	 Reedy: right, I'm done in this case, you can deploy the security fixes. I'll do interwiki and cleanup afterwards.
[22:40:46] <grrrit-wm>	 (03PS2) 10Reedy: Don't override message key in badpass log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319134 (owner: 10Brian Wolff)
[22:40:51] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Don't override message key in badpass log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319134 (owner: 10Brian Wolff)
[22:40:54] <Krenair>	 All stewards can do with private wikis is grant/remove groups from their users
[22:41:25] <matanya>	 only if it is part of the main cluster
[22:41:28] <grrrit-wm>	 (03Merged) 10jenkins-bot: Don't override message key in badpass log entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319134 (owner: 10Brian Wolff)
[22:41:33] <icinga-wm>	 ACKNOWLEDGEMENT - salt-minion processes on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion daniel_zahn decom T95757
[22:42:43] <grrrit-wm>	 (03PS2) 10Reedy: Set $wgOATHAuthAccountPrefix to 'Wikimedia' for WMF CA wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320266 
[22:42:47] <grrrit-wm>	 (03CR) 10Reedy: [C: 032] Set $wgOATHAuthAccountPrefix to 'Wikimedia' for WMF CA wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320266 (owner: 10Reedy)
[22:43:27] <grrrit-wm>	 (03Merged) 10jenkins-bot: Set $wgOATHAuthAccountPrefix to 'Wikimedia' for WMF CA wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320266 (owner: 10Reedy)
[22:44:47] <apergos>	 Reedy: checked a bunch of other private wikis, they all have main page visible, so dunno.  
[22:44:58] <logmsgbot>	 !log reedy@tin Synchronized wmf-config/CommonSettings.php: Set wgOATHAuthAccountPrefix and Don't override message key in badpass log entries (duration: 00m 47s)
[22:45:02] <Reedy>	 It'll be something or nothing
[22:45:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:51:22] <logmsgbot>	 !log reedy@tin Synchronized php-1.29.0-wmf.1/includes/specials/: Deploy security fix T150044 (duration: 00m 54s)
[22:51:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:52:54] <Reedy>	 Dereckson: I think you can continue now
[22:53:15] * Dereckson nods
[22:53:23] <Reedy>	 Thanks
[22:53:59] <apergos>	 looks like all private wikis should have main page whitelisted so I'm outa ideas
[22:54:05] <apergos>	 	'private' => [ 'Main Page', 'Special:UserLogin', 'Special:UserLogout' ],
[22:54:16] <Reedy>	 Page doesn't exist or similar maybe?
[22:54:25] <apergos>	 mm I thought by default there was
[22:54:31] <Reedy>	 It should be
[22:56:24] <Dereckson>	 they can create a redirection from [[Main Page]] to Pagina principal in this case
[22:56:30] <Dereckson>	 and that will solve the issue.
[22:56:40] <icinga-wm>	 PROBLEM - Apache HTTP on mw1277 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.004 second response time
[22:56:47] <Reedy>	 Dereckson: Oh, no
[22:56:57] <Reedy>	 Acutally, yes
[22:56:59] <Reedy>	 Something like that
[22:57:40] <icinga-wm>	 RECOVERY - Apache HTTP on mw1277 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.019 second response time
[22:58:14] <Reedy>	 Or change MediaWiki:mainpage to Main_Page
[22:59:21] <apergos>	 there is a Main_Page there
[22:59:25] <apergos>	 say the dumps :-P
[22:59:50] <apergos>	 maybe it's a matter of the content language, like you say
[22:59:59] <Reedy>	 MW doing something bizarre?
[23:00:04] <Reedy>	 Well I never
[23:00:06] * Reedy grins
[23:00:15] <apergos>	 heh
[23:00:20] <Dereckson>	 en/es issue?
[23:00:26] <apergos>	 prolly
[23:00:36] <apergos>	 been way too long since I've been in volved with one of these though
[23:00:43] <apergos>	 adding a wiki, that is
[23:01:25] <apergos>	 https://el.wiktionary.org/w/index.php?title=Main_Page&redirect=no
[23:01:28] <apergos>	 yep redirect
[23:03:06] <apergos>	 mediawiki could be smart enough to check content language and make the Main_Page redirect to the placeholder content inserted at $translated_title  if content_language is not en...
[23:05:42] <Zppix>	 apergos i agree 
[23:06:32] <apergos>	 that btw is probably the fastest the first dump of a wiki has ever been delivered :-P
[23:06:43] <apergos>	 all right, I'm out.. see folks on us election day :-P
[23:07:49] <mutante>	 nice apergos :) 
[23:12:39] <subbu>	 mutante, i am going to try and upload a new parsoid deb (0.6.0) with the latest master .. do you know if it will delete 0.5.3 or retain it? Last time, the 0.5* and 0.4.1 were left behind.
[23:13:09] <grrrit-wm>	 (03PS1) 10Dereckson: Update interwiki map for vote. and ec.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320308 
[23:13:30] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] Update interwiki map for vote. and ec.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320308 (owner: 10Dereckson)
[23:14:01] <grrrit-wm>	 (03Merged) 10jenkins-bot: Update interwiki map for vote. and ec.wikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320308 (owner: 10Dereckson)
[23:14:13] <mutante>	 subbu: i think it will retain it but users installing will all get the latest version 
[23:14:29] <subbu>	 that is fine.
[23:14:48] <subbu>	 just in case someone wants just a security update and not a whole update .. good to have the 0.5.3 around for a bit.
[23:14:52] <subbu>	 thanks. will upload now.
[23:15:05] <Krinkle>	 !log mwscript --deleteEqualMessages.php --wiki gawiktionary (T45917)
[23:15:08] <Dereckson>	 New interwiki map works on mw1099, syncing.
[23:15:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:15:10] <stashbot>	 T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917
[23:16:20] <subbu>	 mutante, it removed 0.5.3 :( 
[23:16:27] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/interwiki.php: Update interwiki map for vote. and ec.wikimedia ([[Gerrit:320308]]) (duration: 00m 47s)
[23:16:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:16:49] <subbu>	 i guess we'll have to figure out where to host our old deb releases .. since the next release after 0.6.0 will be a big breaking release.
[23:19:29] <mutante>	 subbu: hrmm.. but like you said, in the past the old versions stuck around and they were only deleted with a manual command .. maybe it has to be a new distribution
[23:19:30] <Dereckson>	 !log Created storage container for ec.wikimedia (private)
[23:19:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:53] <mutante>	 subbu: maybe "jessie-mediawiki-old" is needed 
[23:20:23] <subbu>	 mutante, hmm ... looks like varnish might need a purge again .. i am still getting a 0.5.3 when i do a sudo apt-get install parsoid
[23:22:41] <Dereckson>	 !log ec.wikimedia.org wiki creation done
[23:22:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:22:53] <Dereckson>	 !log Starting projectcom.wikimedia.org wiki creation
[23:22:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:25:04] <grrrit-wm>	 (03PS2) 10Dereckson: Initial configuration for projectcom.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320298 (https://phabricator.wikimedia.org/T143138) 
[23:25:20] <icinga-wm>	 PROBLEM - HHVM rendering on mw1207 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.002 second response time
[23:25:21] <mutante>	 subbu: i'll try to find out the commands that godog ran
[23:25:27] <subbu>	 k
[23:26:20] <icinga-wm>	 RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 76130 bytes in 0.349 second response time
[23:27:07] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] Initial configuration for projectcom.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320298 (https://phabricator.wikimedia.org/T143138) (owner: 10Dereckson)
[23:27:41] <grrrit-wm>	 (03Merged) 10jenkins-bot: Initial configuration for projectcom.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320298 (https://phabricator.wikimedia.org/T143138) (owner: 10Dereckson)
[23:30:22] <godog>	 mutante: they'd be on neodymium for the purges, odd though I don't remember having this problem
[23:32:47] <Krinkle>	 !log mwscript --deleteEqualMessages.php --wiki jawikibooks (T45917)
[23:32:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:32:52] <stashbot>	 T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917
[23:34:29] <Dereckson>	 Krenair: we've got a winner! addWiki completed with success at first run.
[23:34:42] <Krenair>	 Dereckson, congratulations
[23:35:07] <Krenair>	 I'm not going to count this one because it's apparently a double wiki creation day :p
[23:35:18] <Dereckson>	 !log projectcomwiki database created
[23:35:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:35:36] <Krenair>	 you fixed the errors that showed up on the first creation?
[23:36:10] <Dereckson>	 No, I'll fill a bug later about that one
[23:36:27] <grrrit-wm>	 (03PS1) 10BBlack: VCL: retry explicit 503 once as well [puppet] - 10https://gerrit.wikimedia.org/r/320310 
[23:37:38] <Krenair>	 Dereckson, so how did you get it to work the second time round?
[23:37:57] <bblack>	 ema: merging https://gerrit.wikimedia.org/r/#/c/320310/ because it seems pretty legit, and likely will reduce the minor 503 spikes we see on e.g. upload be restarts, etc
[23:38:01] <bblack>	 oops
[23:38:42] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] VCL: retry explicit 503 once as well [puppet] - 10https://gerrit.wikimedia.org/r/320310 (owner: 10BBlack)
[23:39:04] <Dereckson>	 Krenair: running it again, commenting the done steps, and that time Elasticsearch index completed
[23:41:17] <logmsgbot>	 !log dereckson@tin Synchronized dblists/: Added projectcomwiki (duration: 00m 48s)
[23:41:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:07] <logmsgbot>	 !log dereckson@tin rebuilt wikiversions.php and synchronized wikiversions files: Added projectcomwiki
[23:42:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:42:31] <Dereckson>	 https://projectcom.wikimedia.org/wiki/Main_Page is live 
[23:43:36] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Initial configuration for projectcom.wikimedia.org (duration: 00m 53s)
[23:43:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:44:57] <Dereckson>	 !log Created storage container for projectcomwiki (private)
[23:45:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:50:39] <Krinkle>	 !log mwscript --deleteEqualMessages.php --wiki jawikinews (T45917)
[23:50:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:50:44] <stashbot>	 T45917: Delete all redundant "MediaWiki" pages for system messages - https://phabricator.wikimedia.org/T45917
[23:54:30] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:55:02] <godog>	 !log delete parsoid from releases.wikimedia.org and varnish-ban on cache_misc
[23:55:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:55:16] <Dereckson>	 !log Created 'Mjohnson (WMF)' user account on projectcom.wikimedia.org as bureaucrat
[23:55:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:59:55] <AndyRussG>	 Hi!