[00:01:51] <wikibugs>	 (03CR) 10Paladox: [C: 031] gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn)
[00:02:25] <wikibugs>	 10Operations, 10Traffic, 10netops: Japanese hotel resolving to esams and going the long way round - https://phabricator.wikimedia.org/T178726#3700933 (10Reedy) p:05Triage>03Low Mobile WiFi on the bus I'm currently on is resolving to ulsfo... So not broken for the whole country ;)  Hopefully just one ISP
[00:02:56] <wikibugs>	 (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/8408/" [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn)
[00:04:00] <wikibugs>	 (03PS11) 10Dzahn: gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074
[00:04:02] <wikibugs>	 (03CR) 10Paladox: [C: 04-1] "Fails the diff, it is removing the port." [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn)
[00:04:13] <paladox>	 mutante you removed the port
[00:04:14] <paladox>	 :)
[00:04:35] <wikibugs>	 (03CR) 10Paladox: [C: 031] gerrit-ssh: don't listen on all interfaces, disable on slaves (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn)
[00:04:45] <wikibugs>	 (03PS12) 10Dzahn: gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074
[00:05:03] <mutante>	 paladox: yes, i did
[00:05:09] <mutante>	 19:57 < mutante> :'port' may be omitted to use the default of 29418.
[00:05:16] <paladox>	 oh ok
[00:05:35] <no_justification>	 I prefer being explicit :)
[00:06:35] <icinga-wm>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[00:06:38] <wikibugs>	 (03PS13) 10Dzahn: gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074
[00:06:44] <mutante>	 ok.. ok. it's back 
[00:06:57] <wikibugs>	 (03CR) 10Paladox: [C: 031] gerrit-ssh: don't listen on all interfaces, disable on slaves [puppet] - 10https://gerrit.wikimedia.org/r/354074 (owner: 10Dzahn)
[00:07:01] <mutante>	 paladox: wow, i used inline editor :p
[00:07:04] <paladox>	 :)
[00:08:31] <paladox>	 no_justification wondering could you add me to this https://gerrit-review.googlesource.com/#/admin/groups/uuid-819ed1064786ed5c11fc9a1fe617b0103fd18d03 groups please.
[00:08:35] <icinga-wm>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 579 bytes in 16.392 second response time
[00:09:13] <no_justification>	 paladox: Nope cuz it won't load for me :p
[00:09:17] <paladox>	 oh
[00:09:17] <no_justification>	 Stuck "Loading...."
[00:09:20] <no_justification>	 Hehe
[00:09:25] <paladox>	 i thought you had +2 on there
[00:09:27] <no_justification>	 Also it's 5pm on a friday, it's beer o'clock :)
[00:09:28] <paladox>	 ah
[00:09:36] <paladox>	 it's probaly slow.
[00:09:43] <paladox>	 and ok
[00:09:47] <no_justification>	 admin/groups won't even load for me rn
[00:10:14] <no_justification>	 Ah at the bottom: "Server Error: uuid-819ed1064786ed5c11fc9a1fe617b0103fd18d03"
[00:10:22] <no_justification>	 Soooo, probably don't have permission to view?
[00:10:24] <paladox>	 oh isee
[00:11:41] <no_justification>	 Funny, this works: https://gerrit-review.googlesource.com/admin/groups/904
[00:11:45] <no_justification>	 Just not by UUID
[00:11:50] <no_justification>	 Prolly a bug? :)
[00:12:30] <paladox>	 oh lol
[00:12:32] <paladox>	 it's a poly bug
[00:12:38] <paladox>	 i desgned the page
[00:12:44] <icinga-wm>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[00:12:53] <no_justification>	 Anyway, remind me monday or something
[00:12:55] * no_justification dips out
[00:12:58] <paladox>	 ok
[00:13:34] <icinga-wm>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 579 bytes in 6.990 second response time
[00:14:24] * paladox looks for a fix for the poly bug
[00:14:35] <paladox>	 ah it's the router
[00:14:55] <paladox>	 not actually the router but the router in poly it's called that.
[00:19:29] <wikibugs>	 10Operations, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625#3272009 (10Dzahn) I recently looked at this and found that quarry is one of the few (or the only) modules currently using this.  I attempted...
[00:22:34] <wikibugs>	 10Operations, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625#3700943 (10Dzahn) ``` modules/quarry/manifests/database.pp:    class { '::mysql::server': modules/role/manifests/wikimetrics/staging.pp:...
[00:23:10] <tzatziki>	 Can anyone purge the mobile app's cache?
[00:23:17] <tzatziki>	 currently showing a big ol' porn image
[00:23:59] <tzatziki>	 maybe mutante ?
[00:24:24] <Jamesofur>	 https://wikitech.wikimedia.org/wiki/MobileFrontend#Flushing_the_cache perhaps
[00:24:33] <mutante>	 sorry, i don't know how
[00:24:47] <mutante>	 but you said app?
[00:25:23] <tzatziki>	 yea
[00:26:01] <mutante>	 the docs reference "fenari". that's not a good sign
[00:26:13] <mutante>	 because that doesnt exist since years
[00:26:33] <legoktm>	 that's...definitely not the right cache
[00:27:00] <legoktm>	 it's the mobileapps service I believe
[00:27:18] <legoktm>	 https://wikitech.wikimedia.org/wiki/Mobileapps_(service) isn't very helpful :(
[00:27:59] <Jamesofur>	 yeah, the cache on the article itself seems fine it's the main page that's the real problem
[00:28:04] <mutante>	 so we should contact one of https://www.mediawiki.org/wiki/Wikimedia_Reading_Infrastructure_team  ? 
[00:28:43] <Jamesofur>	 Yeah, probably, tzatziki is pinging Josh too who may have a sense
[00:29:16] <bearND>	 tzatziki: Which page? If it's  single page then usually and action=purge does help.
[00:29:32] <tzatziki>	 bearND: the main page
[00:29:33] <bearND>	 If it's part of the feed that's another story.
[00:29:34] <wikibugs>	 10Operations, 10Cloud-Services, 10Community-Wikimetrics, 10DBA, and 2 others: Evaluate future of wmf puppet module "mysql" - https://phabricator.wikimedia.org/T165625#3272009 (10zhuyifei1999) >>! In T165625#3700943, @Dzahn wrote: > quarry (it's on trusty and has the precise repos, how does that even work r...
[00:29:35] <tzatziki>	 on the iOS app
[00:29:50] <bearND>	 tzatziki: what language?
[00:30:16] <Jamesofur>	 bearND: English main feed I think. It's actually showing the old featured article (on the enWP page that changed) 
[00:30:27] <Jamesofur>	 (changed at midnight utc0
[00:31:03] <Jamesofur>	 the actual article is fine/not showing the image it's the front page/feed cache that's the issue
[00:32:01] <bearND>	 Jamesofur: oh, ok. I see it now. It's the previous days TFA.
[00:32:14] <Jamesofur>	 yeah
[00:32:17] <paladox>	 no_justification: https://gerrit-review.googlesource.com/#/c/gerrit/+/135550/ :)
[00:33:25] <bearND>	 probably similar to https://phabricator.wikimedia.org/T174993
[00:34:51] <bearND>	 since it's pageimages related.
[00:35:57] <wikibugs>	 (03PS1) 10Dzahn: rm requesttracker::labs class [puppet] - 10https://gerrit.wikimedia.org/r/385495
[00:40:35] <Jamesofur>	 bearND: Yeah, could be, https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&pilicense=any&titles=Boogeyman_2 seems to show the right image atm though not sure if it always has (just checked for the first time)
[00:41:00] <bearND>	 Not sure why it's still showing up in the app. https://en.wikipedia.org/api/rest_v1/feed/featured/2017/10/20 seems to be updated to show the correct thumbnail.
[00:42:11] <bearND>	 ok, on my desktop browser it's updated, but my Android device still shows the old version.
[00:42:23] <Jamesofur>	 bearND: hmmm, weirdly when I click that link it is not updated
[00:42:31] <Jamesofur>	 (on my laptop not my phone)
[00:42:39] <Jamesofur>	 has the porn pic
[00:42:49] <paladox>	 no_justification bug is fixed now heh, just need to get upstream to merge.
[00:43:51] * Josve05a is getting a few emails about the P0rn image at OTRS..guessing iOS cache/fetch issue from a few week ago?
[00:45:00] <no_justification>	 Josve05a: Already reported, people are looking into it
[00:45:04] <no_justification>	 Ty though
[00:45:28] <Josve05a>	 Yeah, just wanted to check if it is the same issue, so I know how to respond
[00:45:31] <bearND>	 I'm wondering if it's something in Varnish. I've purged that thing in RESTBase storage.
[00:45:54] <bearND>	 curl -H 'Cache-Control: no-cache' http://restbase1007.eqiad.wmnet:7231/en.wikipedia.org/v1/feed/featured/2017/10/20
[00:46:22] <Jamesofur>	 when I refresh the page it seems to have worked
[00:47:03] <Jamesofur>	 I think tzatziki just saw it work on the ios app too \o/
[00:47:32] <tzatziki>	 Yeah it seems to be good now!!
[00:48:06] <bearND>	 Ah, good. Back to dinner then. :)
[00:48:31] <Jamesofur>	 thanks bearND !
[00:48:46] <Josve05a>	 I just did a clean install of the beta version of the iOS app, it is fixed there too :)
[00:49:00] <bearND>	 Basically got to run this from the cluster to purge the feed content. That's also documented a bit here: https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps/Deployment_process#Troubleshooting_.26_Restarting_services
[00:50:33] <bearND|afk>	 Pchelolo: mobrovac ^^ see backscroll about more vandalism (possibly another episode of https://phabricator.wikimedia.org/T174993)
[00:51:57] <tzatziki>	 thanks so much bearND|afk :D
[01:04:14] <icinga-wm>	 PROBLEM - Check health of redis instance on 6481 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508547849 600 - REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4202408 keys, up 4 minutes 6 seconds - replication_delay is 1508547849
[01:04:15] <icinga-wm>	 PROBLEM - Check health of redis instance on 6480 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508547849 600 - REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4205057 keys, up 4 minutes 6 seconds - replication_delay is 1508547849
[01:04:44] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1508547880 600 - REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4200888 keys, up 4 minutes 37 seconds - replication_delay is 1508547880
[01:05:45] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4201107 keys, up 5 minutes 37 seconds - replication_delay is 0
[01:06:15] <icinga-wm>	 RECOVERY - Check health of redis instance on 6481 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4195972 keys, up 6 minutes 7 seconds - replication_delay is 0
[01:06:15] <icinga-wm>	 RECOVERY - Check health of redis instance on 6480 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6480 has 1 databases (db0) with 4198865 keys, up 6 minutes 7 seconds - replication_delay is 0
[01:13:24] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[02:33:24] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[02:41:39] <Krinkle>	 That's a lot of WriteThrough
[02:43:24] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[03:13:24] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[03:26:14] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 785.13 seconds
[04:00:24] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 256.84 seconds
[04:25:41] <wikibugs>	 (03CR) 10Dzahn: extdist: use profile::labs::lvm::srv instead of role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/385477 (owner: 10Hashar)
[04:27:04] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [140.0]
[04:36:38] <Josve05a>	 Does anyone here have any contct with someone at Google (involved with Google Knowledge Graph perhps)? Iknow someone said in a Wikipedia group on Facebook to contact him if there as any issues with it, but I can't find him...
[05:23:24] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[05:30:40] <wikibugs>	 (03CR) 10Dzahn: [C: 04-2] "too old now to still rebase and be useful? as you can see i tried this back in May but yea..it's messy now" [puppet] - 10https://gerrit.wikimedia.org/r/355156 (owner: 10Dzahn)
[05:32:14] <wikibugs>	 (03Abandoned) 10Dzahn: contint: role/profile conversion [puppet] - 10https://gerrit.wikimedia.org/r/355156 (owner: 10Dzahn)
[05:44:00] <wikibugs>	 (03CR) 10Dzahn: "also one reason to delete this is that it's one of the few things left using the mysql class (T165625)" [puppet] - 10https://gerrit.wikimedia.org/r/385495 (owner: 10Dzahn)
[05:51:12] <wikibugs>	 (03PS10) 10Dzahn: gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078
[05:52:18] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] gerrit: let Apache proxy only listen on service IP (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn)
[05:53:24] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[06:04:47] <wikibugs>	 (03PS11) 10Dzahn: gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078
[06:05:17] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn)
[06:09:22] <wikibugs>	 (03PS12) 10Dzahn: gerrit: let Apache proxy only listen on service IP [puppet] - 10https://gerrit.wikimedia.org/r/354078
[06:11:27] <wikibugs>	 (03CR) 10Dzahn: "re: comments on Freddy's change: he did nothing wrong, we were at Wikimania and i was showing him Gerrit and it was part of a demo in a wo" [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn)
[06:15:10] <wikibugs>	 (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/8409/" [puppet] - 10https://gerrit.wikimedia.org/r/354078 (owner: 10Dzahn)
[06:53:24] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[07:23:24] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[08:08:18] <hashar>	 !log Stopping Zuul to flush its queue
[08:08:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:16:29] <hashar>	 !log Mass code-review+2 changes made by LibraryUpdater that already had Code-Review:+2 and NOT verified=-1  ( ping legoktm )
[08:16:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:20:49] <hasharAway>	 (it is harmless mostly.  We now have a bot that massively send patches to mediawiki extensions)
[08:20:54] <hasharAway>	 should be fine now
[08:22:34] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0]
[08:23:24] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[08:26:08] <hasharAway>	 Work requests waiting in Zuul Gearman server  -> it will come back eventually but I will monitor it over the next few hours
[08:26:13] <hasharAway>	 good week-end
[09:43:25] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[10:23:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[10:53:25] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[11:17:28] <wikibugs>	 10Operations, 10Traffic, 10netops: Japanese hotel resolving to esams and going the long way round - https://phabricator.wikimedia.org/T178726#3701226 (10Reedy)
[11:18:54] <wikibugs>	 10Operations, 10Traffic, 10netops: Japanese hotel resolving to esams and going the long way round - https://phabricator.wikimedia.org/T178726#3700842 (10Reedy) 05Open>03Invalid Seem this may have "resolved itself"  Now:  ``` $ dig en.wikipedia.org  ; <<>> DiG 9.9.7-P3 <<>> en.wikipedia.org ;; global opti...
[11:21:47] <Zoranzoki21>	 HI
[11:21:50] <Zoranzoki21>	 I have one question
[11:22:06] <Zoranzoki21>	 To add this patch: https://gerrit.wikimedia.org/r/#/c/385771/ on deployments table or no?
[11:22:22] <Zoranzoki21>	 Optional, you can deploy this right now. :D
[11:27:28] <p858snake>	 no, that is for core so it can go out in the next deployment cycle 
[11:27:48] <p858snake>	 s/cycle/train/
[11:30:32] <Zoranzoki21>	 when patch will be deployed?
[11:31:57] <p858snake>	 when the train next happens, its just removing a unused language so its not urgent
[11:33:43] <p858snake>	 although i'm not sure removing it is correct, so we still have wikis using it according to that task
[11:35:45] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 180056.95 seconds
[11:49:50] <wikibugs>	 10Operations, 10media-storage, 10User-fgiunchedi: Deleting file on Commons "Error deleting file: An unknown error occurred in storage backend "local-multiwrite"." - https://phabricator.wikimedia.org/T173374#3701236 (10Jcb) I have tried to delete the files several times in the past few days, but I cannot.
[12:33:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[13:13:25] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[13:43:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[14:13:24] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[14:37:15] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 38 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[14:42:15] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 8 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[15:29:36] <wikibugs>	 10Operations, 10Goal, 10Technical-Debt, 10User-fgiunchedi: Reduce technical debt in metrics monitoring - https://phabricator.wikimedia.org/T177195#3701369 (10Dzahn)
[15:29:39] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3701368 (10Dzahn)
[15:33:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[16:03:25] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[16:05:57] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3701391 (10elukey)
[16:08:18] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3701403 (10elukey) Tried with `sudo megacli -AdpBbuCmd -BbuLearn -aALL` but the battery state seems still to be unknown and not charging :(  @Cmjohnson hi! I think that we might need a new...
[16:08:28] <elukey>	 going to ack the alarms for an1029
[16:09:05] <icinga-wm>	 ACKNOWLEDGEMENT - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough Elukey Probably faulty BBU battery - T178742
[16:54:41] <wikibugs>	 (03PS1) 10Ladsgroup: labs: Disable reverted and wp10 in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794
[16:57:31] <wikibugs>	 (03CR) 10Ladsgroup: [C: 032] "The ores in labs is broken and this is a labs thing only :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794 (owner: 10Ladsgroup)
[17:06:10] <wikibugs>	 (03Merged) 10jenkins-bot: labs: Disable reverted and wp10 in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794 (owner: 10Ladsgroup)
[17:21:46] <wikibugs>	 (03CR) 10jenkins-bot: labs: Disable reverted and wp10 in enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/385794 (owner: 10Ladsgroup)
[18:03:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[18:43:25] <icinga-wm>	 PROBLEM - MegaRAID on analytics1029 is CRITICAL: CRITICAL: 13 LD(s) must have write cache policy WriteBack, currently using: WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough, WriteThrough
[18:53:39] <elukey>	 acked again and set downtime to avoid spam --^
[20:33:25] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[21:39:55] <icinga-wm>	 PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:09:55] <icinga-wm>	 RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures