[00:23:15] PROBLEM - Apache HTTP on mw1197 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.073 second response time [00:23:45] PROBLEM - HHVM rendering on mw1197 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.073 second response time [00:24:15] RECOVERY - Apache HTTP on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.199 second response time [00:24:45] RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 76425 bytes in 1.367 second response time [00:58:25] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:58:35] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:58:35] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:58:35] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [00:59:25] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [01:00:40] Hi, this ia from Taiwanese Wikimedians. We News beuraucrat's help to provide User:Koala0090 [01:01:05] One day right of creating account more then 6 [01:01:25] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [01:01:25] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [01:01:25] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [01:01:26] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [01:01:40] Because he is hosting a workshop for high schools students in Hualien, Taiwan [01:02:11] See here for the workshop detail (in Chinese) 可否請管理員協助提供維基用戶 User:Koala0090 一天開通多帳號權限? 詳見 https://zh.m.wikipedia.org/wiki/Wikipedia:%E8%87%BA%E7%81%A3%E6%95%99%E8%82%B2%E5%B0%88%E6%A1%88/%E6%85%88%E4%B8%AD%E7%B6%AD%E5%9F%BA%E7%B7%A8%E8%AD%AF%E5%AF%AB%E4%BD%9C%E5%9D%8A [01:02:15] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [01:04:50] Please, anyone? [01:06:50] It seems like nobody is here. Thanks anyway! [02:21:43] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 30s) [02:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:27:46] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun May 21 02:27:46 UTC 2017 (duration 6m 3s) [02:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:25] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:46:35] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:46:35] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:46:35] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:47:25] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [02:47:25] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [02:47:25] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [02:47:25] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [02:56:25] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:56:35] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:56:35] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:56:35] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [02:58:15] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [02:58:25] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [02:58:25] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [02:58:25] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [03:33:05] PROBLEM - puppet last run on mw2240 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz] [04:01:05] RECOVERY - puppet last run on mw2240 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [04:08:45] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=3514.10 Read Requests/Sec=4102.90 Write Requests/Sec=11.80 KBytes Read/Sec=30697.60 KBytes_Written/Sec=71.20 [04:16:45] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=7.00 Read Requests/Sec=0.60 Write Requests/Sec=5.30 KBytes Read/Sec=2.80 KBytes_Written/Sec=179.20 [04:25:51] 06Operations, 10ops-eqiad: Degraded RAID on db1024 - https://phabricator.wikimedia.org/T165934#3280907 (10ops-monitoring-bot) [04:51:35] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:51:35] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:51:35] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:51:35] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:51:35] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [04:54:25] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [04:54:25] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [04:54:25] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [04:54:25] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [04:54:25] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [05:57:16] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:57:25] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:57:25] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:57:25] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:57:25] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:58:25] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [05:58:25] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [06:00:15] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [06:00:15] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [06:00:15] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [06:00:15] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [06:00:15] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [06:00:15] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [06:00:16] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [06:39:58] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3280940 (10Perhelion) >>! In T64987#3279102, @Aklapper wrote: > @Perhelion: Does that mean https://bugzilla.gnome.org/show_bug.cgi?id=739329 s... [06:47:05] PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:55:11] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3280941 (10Perhelion) [06:58:09] 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987#3280944 (10Perhelion) [07:16:05] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [08:21:29] (03CR) 10Nemo bis: Enable ValidationStatistics log for FlaggedRevs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354615 (https://phabricator.wikimedia.org/T163107) (owner: 10Nemo bis) [08:31:35] 06Operations, 10Ops-Access-Requests: Access to search logs for Jan Dittrich - https://phabricator.wikimedia.org/T165943#3281090 (10Jan_Dittrich) [08:33:11] i'm still getting that 500 on enhanced watchlist. can that be rolled back until it's fixed? [08:33:35] (or the fix deployed) [08:35:45] Where's the patch again? [08:37:28] https://gerrit.wikimedia.org/r/#/c/354602/ looks related [08:37:31] But wasn't there another? [08:38:37] https://gerrit.wikimedia.org/r/#/c/350914/ ? [08:38:59] (03PS2) 10Nemo bis: Remove $wgEnableValidationStatisticsUpdates from FlaggedRevs config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354600 [08:39:23] that's what gwicke linked yesterday iianm [08:41:04] Which is CR-1'd [08:42:38] idk... i just would like if it worked as it worked a week ago normally [08:44:54] Danny_B: i don't think there were any changes. it's an ongoing issue for users with large watchlists, i think. perhaps you're hitting a slower database server or something. [08:45:27] Danny_B: there was also something about this in tech news a week ago or two, a similar problem for users who had "watch categorization changes" enabled [08:46:18] Reedy: Do you think it'd be sane to just return the empty string there? [08:46:52] The callers use... [08:46:55] $data['historyLink'] = $this->getDiffHistLinks( $rcObj, $query ); [08:47:23] The callers use... [08:47:24] $data['historyLink'] = $this->getDiffHistLinks( $rcObj, $query ); [08:47:26] and then later [08:47:27] $line .= implode( '', $data ); [08:49:42] There's an if statement for type of rc entry, so in many places it just doesn't have a historyLink entry in the array [08:51:41] * bawolff made a new version returning an empty string [08:53:36] What does implode do with array keys? [08:55:07] it just ignores them [08:55:24] I think equivalent to implode( '', array_values( $data ) ); [09:06:45] !log smalyshev@tin Started deploy [wdqs/wdqs@227ab25]: Redeploy GUI due to breakage in T165228 [09:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:54] T165228: Query results are downloaded in wrong encoding - https://phabricator.wikimedia.org/T165228 [09:07:04] !log smalyshev@tin Finished deploy [wdqs/wdqs@227ab25]: Redeploy GUI due to breakage in T165228 (duration: 00m 19s) [09:07:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:13:25] PROBLEM - HHVM rendering on mw2125 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:15] RECOVERY - HHVM rendering on mw2125 is OK: HTTP OK: HTTP/1.1 200 OK - 76275 bytes in 0.176 second response time [09:19:08] (03PS1) 10Reedy: Print dbname before running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354919 [09:20:01] (03CR) 10Greg Grossmeier: [C: 031] Print dbname before running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354919 (owner: 10Reedy) [09:21:55] (03CR) 10Rush: [C: 032] Print dbname before running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354919 (owner: 10Reedy) [09:22:00] (03CR) 10Rush: [V: 032 C: 032] "seems to only effect beta and greg gave a +1 seems fine to me" [puppet] - 10https://gerrit.wikimedia.org/r/354919 (owner: 10Reedy) [09:22:16] :) [09:22:27] "blame greg if it breaks" [09:22:52] See if we can see which db is brokened [09:22:57] Hey joal. Someone looking for you in the Atrium. [09:23:04] greg-g: nahhhhhhhhhh but you're mr. beta [09:24:09] :) [09:42:06] !log force ran puppet on deployment-tin to pickup dbname in wmf-beta-update-database.py [09:42:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:22] (03PS1) 10Reedy: Do the echo when running update.php [puppet] - 10https://gerrit.wikimedia.org/r/354932 [10:12:28] (03PS1) 10Filippo Giunchedi: Test for unreferenced files introduced by changes [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/354939 [10:14:13] ACKNOWLEDGEMENT - Check systemd state on labstore2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. andrew bogott This box is a WIP [10:15:01] (03CR) 10jerkins-bot: [V: 04-1] Test for unreferenced files introduced by changes [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/354939 (owner: 10Filippo Giunchedi) [10:56:25] (03PS3) 10Mark Bergsma: Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685 [10:56:27] (03PS5) 10Mark Bergsma: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 [10:56:29] (03PS2) 10Mark Bergsma: Create new BGP message classes for incremental construction [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 [10:56:31] (03PS5) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 [10:56:33] (03PS3) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 [11:00:53] (03CR) 10Mark Bergsma: [C: 032] Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685 (owner: 10Mark Bergsma) [11:01:39] (03Merged) 10jenkins-bot: Use a bytearray to encode prefixes in BGP.encodePrefixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354685 (owner: 10Mark Bergsma) [11:02:25] (03PS2) 10Volans: Puppet compiler: automatically sync from all masters [puppet] - 10https://gerrit.wikimedia.org/r/354105 (https://phabricator.wikimedia.org/T165583) [11:02:43] (03CR) 10Mark Bergsma: [C: 032] Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 (owner: 10Mark Bergsma) [11:04:18] (03Merged) 10jenkins-bot: Use a bytearray to build IPPrefix [debs/pybal] - 10https://gerrit.wikimedia.org/r/354711 (owner: 10Mark Bergsma) [11:05:36] (03PS3) 10Giuseppe Lavagetto: Add netlink-based Ipvsmanager implementation [debs/pybal] - 10https://gerrit.wikimedia.org/r/302882 [11:05:42] (03CR) 10jerkins-bot: [V: 04-1] Add netlink-based Ipvsmanager implementation [debs/pybal] - 10https://gerrit.wikimedia.org/r/302882 (owner: 10Giuseppe Lavagetto) [11:13:46] 06Operations, 05MW-1.30-release-notes, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor should reject thumbnail requests that are the same size as the original or bigger - https://phabricator.wikimedia.org/T150741#3281679 (10Gilles) Assuming the above change works and we only need to run refres... [11:18:26] (03PS5) 10Volans: Puppet: run-puppet-agent, add --failed-only option [puppet] - 10https://gerrit.wikimedia.org/r/349416 [11:47:54] (03PS1) 10Dereckson: Add techconduct.wikimedia.orgfor new private wiki [dns] - 10https://gerrit.wikimedia.org/r/354954 (https://phabricator.wikimedia.org/T165977) [11:48:23] (03PS2) 10Dereckson: Add techconduct.wikimedia.org for new private wiki [dns] - 10https://gerrit.wikimedia.org/r/354954 (https://phabricator.wikimedia.org/T165977) [11:50:19] (03PS4) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 [11:50:21] (03PS1) 10Mark Bergsma: Add GPLv2 license header to bgp.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/354955 [12:11:28] (03PS1) 10Dereckson: Apache: add techconduct.wm.o to remnant sites [puppet] - 10https://gerrit.wikimedia.org/r/354959 (https://phabricator.wikimedia.org/T165977) [12:11:56] (03CR) 10Mark Bergsma: [C: 04-2] "bgp.py should not in any way depend on pybal classes" [debs/pybal] (1.13) - 10https://gerrit.wikimedia.org/r/344659 (owner: 10Ema) [12:12:49] (03CR) 10Mark Bergsma: [C: 04-2] "bgp.py should not in any way depend on pybal classes" [debs/pybal] - 10https://gerrit.wikimedia.org/r/354677 (owner: 10Ema) [12:13:37] (03PS1) 10Dereckson: Don't replicate techconductwiki to labs [puppet] - 10https://gerrit.wikimedia.org/r/354960 (https://phabricator.wikimedia.org/T165977) [12:14:19] (03CR) 10Mark Bergsma: [C: 04-1] "I like moving the IPPrefix classes to a separate module, as long as we keep it separate and independent of pybal." [debs/pybal] - 10https://gerrit.wikimedia.org/r/354746 (owner: 10Ema) [12:14:32] (03CR) 10jerkins-bot: [V: 04-1] Don't replicate techconductwiki to labs [puppet] - 10https://gerrit.wikimedia.org/r/354960 (https://phabricator.wikimedia.org/T165977) (owner: 10Dereckson) [12:26:35] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:26:35] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:26:35] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:26:35] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:27:25] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:27:25] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:27:25] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:28:25] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [12:28:25] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [12:28:35] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:28:35] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:28:45] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:28:45] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:29:25] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [12:29:25] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [12:29:25] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [12:29:35] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [12:29:35] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [12:30:25] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [12:30:25] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [12:30:25] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [12:30:35] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [12:36:58] (03PS2) 10Dereckson: Don't replicate techconductwiki to labs [puppet] - 10https://gerrit.wikimedia.org/r/354960 (https://phabricator.wikimedia.org/T165977) [12:48:35] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:48:35] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:48:35] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:48:35] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:48:35] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:48:35] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:48:36] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [12:50:25] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [12:50:25] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [12:50:26] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [12:50:26] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [12:50:26] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [12:50:26] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [12:50:26] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [12:58:23] (03PS1) 10Ema: Add unit tests for the Origin and BaseASPath BGP attributes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354972 [12:58:45] (03PS3) 10Volans: Puppet compiler: automatically sync from all masters [puppet] - 10https://gerrit.wikimedia.org/r/354105 (https://phabricator.wikimedia.org/T165583) [13:01:30] (03Abandoned) 10Ema: bgp: log with util.log instead of printing to stdout [debs/pybal] - 10https://gerrit.wikimedia.org/r/354677 (owner: 10Ema) [13:13:02] (03PS3) 10Filippo Giunchedi: prometheus: report puppet agent stats [puppet] - 10https://gerrit.wikimedia.org/r/354007 [13:13:04] (03PS2) 10Filippo Giunchedi: base: report prometheus agent stats [puppet] - 10https://gerrit.wikimedia.org/r/354457 [13:13:06] (03PS2) 10Filippo Giunchedi: prometheus: add alertmanager_url to prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/354459 [13:13:08] (03PS2) 10Filippo Giunchedi: role: use alertmanager in beta prometheus [puppet] - 10https://gerrit.wikimedia.org/r/354460 [13:13:10] (03PS1) 10Filippo Giunchedi: role: set external url for prometheus beta [puppet] - 10https://gerrit.wikimedia.org/r/354975 [13:13:12] (03PS1) 10Filippo Giunchedi: WIP prometheus::alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/354976 [13:15:10] (03CR) 10jerkins-bot: [V: 04-1] WIP prometheus::alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/354976 (owner: 10Filippo Giunchedi) [13:19:35] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:19:35] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:19:35] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:19:35] PROBLEM - citoid endpoints health on scb2005 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:19:35] PROBLEM - citoid endpoints health on scb2004 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:19:35] PROBLEM - citoid endpoints health on scb2003 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:19:35] PROBLEM - citoid endpoints health on scb2006 is CRITICAL: /api (open graph via native scraper) timed out before a response was received [13:21:25] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [13:21:35] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [13:22:25] RECOVERY - citoid endpoints health on scb2004 is OK: All endpoints are healthy [13:22:27] RECOVERY - citoid endpoints health on scb2003 is OK: All endpoints are healthy [13:22:27] RECOVERY - citoid endpoints health on scb2005 is OK: All endpoints are healthy [13:22:27] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [13:22:27] RECOVERY - citoid endpoints health on scb2006 is OK: All endpoints are healthy [13:27:39] 06Operations, 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Create CoC committee private wiki - https://phabricator.wikimedia.org/T165977#3282604 (10Dereckson) Adding #DBA for the **PRIVATE** database part and to notify them we're creating a new private wiki. Adding #operations for Apache :80 redir... [13:30:34] 06Operations, 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Create CoC committee private wiki - https://phabricator.wikimedia.org/T165977#3282613 (10Dereckson) a:05Dereckson>03jcrespo Jaime, I assign this task yo you to block it until you give us a green light replication to labs is disabled. I... [13:42:53] (03PS1) 10Dereckson: Set initial configuration for techconduct.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354985 (https://phabricator.wikimedia.org/T165977) [13:44:38] (03CR) 10jerkins-bot: [V: 04-1] Set initial configuration for techconduct.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354985 (https://phabricator.wikimedia.org/T165977) (owner: 10Dereckson) [13:55:35] (03PS2) 10Dereckson: Set initial configuration for techconduct.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354985 (https://phabricator.wikimedia.org/T165977) [14:07:37] 06Operations, 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Create CoC committee private wiki - https://phabricator.wikimedia.org/T165977#3282698 (10Dereckson) p:05Triage>03Normal [14:41:53] (03PS2) 10Ema: Move BGP classes to bgp.bgp, IP classes to bgp.ip [debs/pybal] - 10https://gerrit.wikimedia.org/r/354746 [14:43:19] (03PS3) 10Ema: Move BGP classes to bgp.bgp, IP classes to bgp.ip [debs/pybal] - 10https://gerrit.wikimedia.org/r/354746 [14:45:30] (03PS4) 10Ema: Move BGP classes to bgp.bgp, IP classes to bgp.ip [debs/pybal] - 10https://gerrit.wikimedia.org/r/354746 [14:48:50] (03CR) 10Mark Bergsma: [C: 032] Move BGP classes to bgp.bgp, IP classes to bgp.ip [debs/pybal] - 10https://gerrit.wikimedia.org/r/354746 (owner: 10Ema) [14:51:53] (03PS2) 10Amire80: [DON'T MERGE] Remove special Math extension settings for hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353970 [14:57:47] (03PS3) 10Amire80: Remove special Math extension settings for hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/353970 [15:00:32] (03CR) 10Mark Bergsma: Add netlink-based Ipvsmanager implementation (031 comment) [debs/pybal] (2.0-dev) - 10https://gerrit.wikimedia.org/r/354509 (owner: 10Giuseppe Lavagetto) [15:04:26] (03Abandoned) 10Ema: Add unit tests for the Origin and BaseASPath BGP attributes [debs/pybal] - 10https://gerrit.wikimedia.org/r/354972 (owner: 10Ema) [15:05:37] (03PS4) 10Filippo Giunchedi: prometheus: report puppet agent stats [puppet] - 10https://gerrit.wikimedia.org/r/354007 [15:05:39] (03PS3) 10Filippo Giunchedi: base: report prometheus agent stats [puppet] - 10https://gerrit.wikimedia.org/r/354457 [15:05:41] (03PS3) 10Filippo Giunchedi: prometheus: add alertmanager_url to prometheus server [puppet] - 10https://gerrit.wikimedia.org/r/354459 [15:05:43] (03PS3) 10Filippo Giunchedi: role: use alertmanager in beta prometheus [puppet] - 10https://gerrit.wikimedia.org/r/354460 [15:05:45] (03PS2) 10Filippo Giunchedi: role: set external url for prometheus beta [puppet] - 10https://gerrit.wikimedia.org/r/354975 [15:05:47] (03PS2) 10Filippo Giunchedi: WIP prometheus::alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/354976 [15:09:11] (03CR) 10jerkins-bot: [V: 04-1] WIP prometheus::alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/354976 (owner: 10Filippo Giunchedi) [15:25:31] (03CR) 10Ema: [C: 031] Create new BGP message classes for incremental construction [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 (owner: 10Mark Bergsma) [15:32:34] (03CR) 10Ema: [C: 04-1] Adapt NaiveBGPPeering to support UPDATE message overflow (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 (owner: 10Mark Bergsma) [15:34:54] (03CR) 10Ema: Allow for withdrawals and NLRI to be sent in the same UPDATE (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 (owner: 10Mark Bergsma) [15:35:03] (03CR) 10Ema: [C: 031] Add GPLv2 license header to bgp.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/354955 (owner: 10Mark Bergsma) [15:35:33] (03PS1) 10Ema: bgp: add a few unit tests [debs/pybal] - 10https://gerrit.wikimedia.org/r/355000 [15:37:06] moritzm: I am at the hackathon, but jfyi I'll follow up on the NFS ferm rules stuff tomorrow or Tuesday :) [15:39:26] (03CR) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 (owner: 10Mark Bergsma) [15:52:35] (03CR) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 (owner: 10Mark Bergsma) [15:53:52] (03PS6) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 [15:53:54] (03PS5) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 [15:53:56] (03PS2) 10Mark Bergsma: Add GPLv2 license header to bgp.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/354955 [15:54:51] (03CR) 10Mark Bergsma: Adapt NaiveBGPPeering to support UPDATE message overflow (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354686 (owner: 10Mark Bergsma) [15:56:16] (03CR) 10Mark Bergsma: [C: 032] bgp: add a few unit tests [debs/pybal] - 10https://gerrit.wikimedia.org/r/355000 (owner: 10Ema) [16:00:15] (03CR) 10Mark Bergsma: [C: 032] Create new BGP message classes for incremental construction [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 (owner: 10Mark Bergsma) [16:01:09] (03Merged) 10jenkins-bot: Create new BGP message classes for incremental construction [debs/pybal] - 10https://gerrit.wikimedia.org/r/354684 (owner: 10Mark Bergsma) [16:01:44] (03CR) 10Mark Bergsma: [C: 032] Add GPLv2 license header to bgp.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/354955 (owner: 10Mark Bergsma) [16:09:38] (03CR) 10Mark Bergsma: [C: 04-1] Instrumentation fixes (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/354680 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [16:33:15] (03PS2) 10Filippo Giunchedi: Test for unreferenced files introduced by changes [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/354939 [16:38:04] (03PS3) 10Filippo Giunchedi: WIP prometheus::alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/354976 [16:39:36] (03CR) 10jerkins-bot: [V: 04-1] WIP prometheus::alertmanager [puppet] - 10https://gerrit.wikimedia.org/r/354976 (owner: 10Filippo Giunchedi) [16:49:56] ACKNOWLEDGEMENT - HP RAID on ms-be2029 is CRITICAL: CHECK_NRPE: Socket timeout after 50 seconds. nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T166021 [16:50:04] 06Operations, 10ops-codfw: Degraded RAID on ms-be2029 - https://phabricator.wikimedia.org/T166021#3283038 (10ops-monitoring-bot) [16:52:13] (03CR) 10Filippo Giunchedi: [C: 031] Puppet compiler: automatically sync from all masters [puppet] - 10https://gerrit.wikimedia.org/r/354105 (https://phabricator.wikimedia.org/T165583) (owner: 10Volans) [16:55:10] (03CR) 10Filippo Giunchedi: Puppet: run-puppet-agent, add --failed-only option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/349416 (owner: 10Volans) [17:24:45] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:52:45] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:05:45] PROBLEM - puppet last run on db1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:34:45] RECOVERY - puppet last run on db1082 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [18:49:13] (03PS6) 10Mark Bergsma: Allow for withdrawals and NLRI to be sent in the same UPDATE [debs/pybal] - 10https://gerrit.wikimedia.org/r/354723 [18:49:15] (03PS3) 10Mark Bergsma: Add GPLv2 license header to bgp.py [debs/pybal] - 10https://gerrit.wikimedia.org/r/354955 [20:48:55] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:17:55] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:24:21] 06Operations, 10RESTBase, 06Services, 10Wikimedia-Site-requests: Index page https://wikimedia.org/api/ is broken / RESTBase not discoverable - https://phabricator.wikimedia.org/T138848#3283225 (10Krinkle)