[02:10:21] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 18989296 and 0 seconds [02:15:13] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 25531216 and 0 seconds [02:15:13] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 22686640 and 0 seconds [02:17:39] RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3456 and 51 seconds [02:17:39] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3456 and 51 seconds [02:17:39] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 3456 and 51 seconds [02:58:39] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [03:37:51] PROBLEM - puppet last run on mw2277 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [04:03:49] RECOVERY - puppet last run on mw2277 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [06:28:21] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.155 second response time https://wikitech.wikimedia.org/wiki/Netbox [06:30:27] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:37:53] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:38:15] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.661 second response time https://wikitech.wikimedia.org/wiki/Netbox [07:09:01] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [07:10:03] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.576 second response time https://phabricator.wikimedia.org/T174916 [07:15:05] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [07:16:13] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.143 second response time https://phabricator.wikimedia.org/T174916 [07:21:07] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [07:23:23] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.073 second response time https://phabricator.wikimedia.org/T174916 [07:27:07] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [08:49:07] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 36 probes of 393 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [08:49:56] !log restart pdfrender on scb1004 [08:49:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:47] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time https://phabricator.wikimedia.org/T174916 [08:54:19] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 18 probes of 393 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts [10:27:14] (03PS15) 10Daimona Eaytoy: Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 [10:27:22] (03PS14) 10Daimona Eaytoy: Move all AbuseFilter config to abusefilter.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/477063 (https://phabricator.wikimedia.org/T145931) [10:27:28] (03PS5) 10Daimona Eaytoy: Remove $wgAbuseFilterRuntimeProfile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486470 (https://phabricator.wikimedia.org/T191039) [10:33:59] (03CR) 10jerkins-bot: [V: 04-1] Update AbuseFilter config to keep the status quo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/475772 (owner: 10Daimona Eaytoy) [11:00:20] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10JruwJN) #operations #security-team #release-engineering-team Why you don't tell the people that the real reason for stopping gerrit is the compromising of an ops acc... [11:03:18] I just disabled this account ^ [11:03:34] for any ops/security if they are around [11:21:44] Thanks Amir1 [11:26:16] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10thiemowmde) 05Resolved→03Open I can't login to Gerrit any more. I created T218507, not knowing what was going on. Given this ticket doesn't explain much, I still... [11:28:04] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10JruwJN2) I repeat my question again: #operations #security-team #release-engineering-team Why you don't tell the people that the real reason for stopping gerrit is... [11:28:54] Amir1: ---^ [11:29:30] elukey: done [11:34:23] Hi there - any phab admin? [11:34:28] Daimona: yup [11:35:00] Amir1: https://phabricator.wikimedia.org/p/JruwJN2/ [11:35:06] He is persistent [11:36:05] I see, I don't have the right but I think I know what's needed [11:36:12] wait a minute [11:38:07] Ok [11:51:57] !log ladsgroup@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki=labswiki --force --sysop Ladsgroup [11:51:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:38] 10Operations, 10Gerrit, 10Release-Engineering-Team: I'm refused to login to Gerrit - https://phabricator.wikimedia.org/T218507 (10MarcoAurelio) Better tags. [12:02:22] I'm around. If you see anything, just ping me [12:23:26] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10MGChecker) I don't think removing these comments is a good idea, as people subscribed to this task recieve the respective email notifications anyway and are left with... [12:37:00] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10Daimona) >>! In T218472#5030487, @MGChecker wrote: > I don't think removing these comments is a good idea, as people subscribed to this task recieve the respective em... [12:41:45] Any issues with the job queue? [12:41:49] https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress?username=Milton_Osmani_Reyes_Encalada refuses to start [12:44:11] PROBLEM - puppet last run on an-worker1091 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:44:20] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10akosiaris) I think the intention are to (somewhat ?) limit the impact the vandal is trying to achieve (at least by removing the capability to link to those comments).... [12:46:38] 10Operations, 10Gerrit, 10Release-Engineering-Team: I'm refused to login to Gerrit - https://phabricator.wikimedia.org/T218507 (10Dereckson) @thiemowmde Did you try your username in lowercase? [12:51:43] 10Operations, 10Gerrit, 10Release-Engineering-Team: I'm refused to login to Gerrit - https://phabricator.wikimedia.org/T218507 (10MGChecker) >>! In T218507#5030499, @Dereckson wrote: > @thiemowmde Did you try your username in lowercase? When using git review yesterday, I couldn't authenticate until I wrote... [12:53:06] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10MGChecker) I do not really get how this is related to the impact of the vandal. as he basically states a single claim that would have to be denied or confirmed anyway... [12:56:46] hauskatze: it seems okay now. right? [12:56:55] let me see [12:57:03] I went doing other things meanwhile :) [12:57:13] Yup, finally [12:57:28] I asked because global renames are mostly instant these days [13:02:23] https://grafana.wikimedia.org/d/000000400/jobqueue-eventbus?orgId=1 might be the jobqueue was busy [13:02:35] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:07:47] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [13:08:05] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10Aklapper) //No need to keep random unconfirmed claims posted by an account with previously zero activity to spread FUD, jump guns, create confusion. Please refer to... [13:08:58] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10MarcoAurelio) IMHO I don't think spreading [[ https://en.wikipedia.org/wiki/Fear,_uncertainty_and_doubt | FUD ]] helps anyone. Wikimedia will let us all know in due t... [13:15:25] RECOVERY - puppet last run on an-worker1091 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:41:43] Hi, Did someone edit All-Users? [13:45:43] Someone’s reporting watchlists problems and that’s the repo where it’s stored [13:49:00] that's refs/meta/config repo only right? [13:49:16] Nope [13:49:23] It has refs/* [13:49:36] Basically it stores all accounts in that repo [13:50:00] I forget the ref spec but I think it has refs/users [13:50:34] I cannot see anything there - perhaps only Admins can see it? [13:50:43] Yup [13:50:50] Only the user can see there own ref [13:50:55] And admins can see all [13:52:37] warning: remote HEAD refers to nonexistent ref, unable to checkout. [13:52:42] Yup [13:52:49] Then you edit .git/config [13:52:59] Changing refs/heads/ with refs/ [13:53:02] Then git pull [13:53:41] fetch = +refs/heads/*:refs/remotes/origin/* <-- this ? [13:53:45] Yup [13:53:54] fetch = +refs/ [13:54:00] or refs/* ? [13:54:05] +refs/* [13:54:12] fetch = +refs/*:refs/remotes/origin/* <-- this ? [13:54:28] That ^^ forgot to cut the end off :) [13:56:41] that worked [13:56:44] lunch, see you [15:51:32] Hello - someone from Security online? [15:52:04] (Unsure where to ask) [15:54:52] Perhaps chasemp, sbassett (?) [16:08:24] Wassup Daimona? [16:27:54] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10JBennett) Todays updates sent to wikitech-l https://lists.wikimedia.org/pipermail/wikitech-l/2019-March/091769.html [16:29:04] (03PS1) 10Paladox: Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/497120 [16:29:49] vulnerability in Gerrit? [16:30:33] yeah... [16:31:00] Someone needs to report that upstream please. [16:32:44] using https://bugs.chromium.org/p/gerrit/issues/entry?template=Security+Issue [16:32:54] It will be done when appropriate [16:33:17] paladox: We do know how to operate with other people that make software ;) [16:33:43] ok [16:45:11] (03PS2) 10Paladox: Merge branch 'stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/497120 (https://phabricator.wikimedia.org/T218515) [16:55:35] PROBLEM - LVS HTTP IPv4 on eventgate-analytics.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.42 and port 31192: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [16:56:47] RECOVERY - LVS HTTP IPv4 on eventgate-analytics.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 801 bytes in 0.074 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [17:00:58] 10Operations, 10Gerrit, 10Release-Engineering-Team: I'm refused to login to Gerrit - https://phabricator.wikimedia.org/T218507 (10thiemowmde) 05Open→03Invalid I'm in contact with the security team. Long story short: It's not me. [17:21:57] oh, i've discovered a bug in PolyGerrit. [17:24:06] just one? [17:26:25] Reedy no, i've made many bugs :P (but this one should really be checking the auth policy). [17:28:40] That.. sounds like a security bug [17:30:56] kind of yes but cannot be executed unless there's a bug with the backend. (though im already working on a fix). [17:40:52] i've submitted this fix https://gerrit-review.googlesource.com/c/gerrit/+/218172/ [17:41:54] is it restricted? [17:41:56] Error 404 (Not Found): Not found: gerrit~218172 [17:42:06] yes [17:42:47] well, their filter works correctly, then [17:50:22] yup [17:52:13] 404 instead of 403 though [17:53:13] yeh, done on purpose [17:53:16] You really shouldn't expect google to be competent at writing software [17:53:24] I know [17:53:27] as PolyGerrit uses the rest api [17:53:42] 404 instead of 403 is just security through obscurity [17:53:45] I just dislike systems which do this [17:54:06] the 404 is comming from https://github.com/GerritCodeReview/gerrit/blob/master/polygerrit-ui/app/elements/gr-app.js#L338 [18:03:45] 10Operations, 10Services, 10VisualEditor, 10Readers-Web-Backlog (Tracking), 10Wikimedia-production-error: [Bug] Sporadic 503 errors when editing - https://phabricator.wikimedia.org/T218252 (10Jdlrobson) I dont know anything about the action=visualeditor. Am hoping editing do (its tagged #VisualEditor but... [18:46:41] 10Operations, 10Traffic, 10Wikidata, 10Wikidata-Query-Service, 10User-Smalyshev: Reduce / remove the aggessive cache busting behaviour of wdqs-updater - https://phabricator.wikimedia.org/T217897 (10Smalyshev) [19:36:54] 10Operations, 10Gerrit, 10Release-Engineering-Team, 10User-greg: gerrit.wikimedia.org is down - https://phabricator.wikimedia.org/T218472 (10thiemowmde) 05Open→03Resolved Gerrit is not "down" any more, reopening this was a mistake. Sorry. [20:44:59] (03PS3) 10BryanDavis: toolforge: Cleanup host_aliases and exim4 conf for Trusty grid [puppet] - 10https://gerrit.wikimedia.org/r/496680 (https://phabricator.wikimedia.org/T109485) [21:54:59] PROBLEM - Nginx local proxy to apache on mw1325 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 8.630 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:54:59] PROBLEM - Apache HTTP on mw1325 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 6.371 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:56:03] RECOVERY - Nginx local proxy to apache on mw1325 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.196 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:56:05] RECOVERY - Apache HTTP on mw1325 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.117 second response time https://wikitech.wikimedia.org/wiki/Application_servers [23:29:21] (03PS1) 10BryanDavis: cloud-vps: Remove mysql packages from openstack::clientpackages::mitaka::* [puppet] - 10https://gerrit.wikimedia.org/r/497210 (https://phabricator.wikimedia.org/T218009) [23:30:29] (03CR) 10BryanDavis: [C: 04-1] "Needs discussion, but posting so I can cherry-pick and stop things from getting more messed up in Toolforge by this." [puppet] - 10https://gerrit.wikimedia.org/r/497210 (https://phabricator.wikimedia.org/T218009) (owner: 10BryanDavis) [23:30:52] (03PS2) 10BryanDavis: cloud-vps: Remove mysql packages from openstack::clientpackages::mitaka::* [puppet] - 10https://gerrit.wikimedia.org/r/497210 (https://phabricator.wikimedia.org/T218009) [23:41:20] (03CR) 10BryanDavis: [C: 04-1] "Cherry-picked to tools-puppetmaster-01 as a quick fix for T218494" [puppet] - 10https://gerrit.wikimedia.org/r/497210 (https://phabricator.wikimedia.org/T218009) (owner: 10BryanDavis) [23:54:21] (03PS1) 10BryanDavis: toolforge: Install qstat-full in /usrr/local/bin [puppet] - 10https://gerrit.wikimedia.org/r/497216 (https://phabricator.wikimedia.org/T218504) [23:55:35] (03PS2) 10BryanDavis: toolforge: Install qstat-full in /usr/local/bin [puppet] - 10https://gerrit.wikimedia.org/r/497216 (https://phabricator.wikimedia.org/T218504)