[00:01:40] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:47:46] (03CR) 10Andrew Bogott: [C: 03+2] Openstack hiera: remove keystone_host hiera setting [puppet] - 10https://gerrit.wikimedia.org/r/591525 (owner: 10Andrew Bogott) [02:54:04] (03PS1) 10Andrew Bogott: Designate hiera: removed a comment about a spof that is no longer a spof [puppet] - 10https://gerrit.wikimedia.org/r/592183 [02:54:55] (03CR) 10Andrew Bogott: [C: 03+2] Designate hiera: removed a comment about a spof that is no longer a spof [puppet] - 10https://gerrit.wikimedia.org/r/592183 (owner: 10Andrew Bogott) [03:02:46] (03PS1) 10Andrew Bogott: cloud-vps puppetmaster: add a missing ) [puppet] - 10https://gerrit.wikimedia.org/r/592184 (https://phabricator.wikimedia.org/T249941) [03:14:44] (03CR) 10Andrew Bogott: [C: 03+2] cloud-vps puppetmaster: add a missing ) [puppet] - 10https://gerrit.wikimedia.org/r/592184 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:18:16] RECOVERY - Check systemd state on labtestpuppetmaster2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [03:22:03] (03PS2) 10Andrew Bogott: ordered_json.rb: add a new function, verbose_ordered_json [puppet] - 10https://gerrit.wikimedia.org/r/589741 [03:22:05] (03PS2) 10Andrew Bogott: mcrouter: get some newlines in the mcrouter config [puppet] - 10https://gerrit.wikimedia.org/r/589742 [03:24:07] (03CR) 10jerkins-bot: [V: 04-1] ordered_json.rb: add a new function, verbose_ordered_json [puppet] - 10https://gerrit.wikimedia.org/r/589741 (owner: 10Andrew Bogott) [03:25:47] (03PS3) 10Andrew Bogott: ordered_json.rb: add a new function, ordered_json_verbose [puppet] - 10https://gerrit.wikimedia.org/r/589741 [03:25:49] (03PS3) 10Andrew Bogott: mcrouter: get some newlines in the mcrouter config [puppet] - 10https://gerrit.wikimedia.org/r/589742 [03:27:43] (03CR) 10jerkins-bot: [V: 04-1] ordered_json.rb: add a new function, ordered_json_verbose [puppet] - 10https://gerrit.wikimedia.org/r/589741 (owner: 10Andrew Bogott) [06:32:18] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active - NTT, AS2914/IPv6: Active - NTT https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [06:45:24] PROBLEM - OSPF status on cr2-eqord is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:45:30] PROBLEM - BFD status on cr2-eqord is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [06:45:40] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 269, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:46:12] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:49:24] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 271, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:49:54] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:51:00] RECOVERY - OSPF status on cr2-eqord is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:51:06] RECOVERY - BFD status on cr2-eqord is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [07:00:04] Deploy window No Deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200423T0700) [07:19:18] PROBLEM - snapshot of s3 in eqiad on db1115 is CRITICAL: snapshot for s3 at eqiad taken more than 3 days ago: Most recent backup 2020-04-20 06:46:44 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [07:24:46] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [07:37:36] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22725 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [07:42:42] RECOVERY - BGP status on cr1-eqiad is OK: BGP OK - up: 38, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [08:06:52] cdanis: around? [08:07:09] or any WMF ops around? [08:14:19] (03PS9) 10Ayounsi: Netbox driven switch interfaces configuration [homer/public] - 10https://gerrit.wikimedia.org/r/547584 (https://phabricator.wikimedia.org/T250429) [08:14:21] (03PS1) 10Ayounsi: Netbox driven routers disabled interfaces [homer/public] - 10https://gerrit.wikimedia.org/r/592246 [08:16:50] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:25:56] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22728 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:41:45] (03PS1) 10Jcrespo: transfer.py: Let netcat listening timeout after 10 seconds [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/592247 [08:42:07] (03CR) 10jerkins-bot: [V: 04-1] transfer.py: Let netcat listening timeout after 10 seconds [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/592247 (owner: 10Jcrespo) [08:48:58] (03PS2) 10Jcrespo: transfer.py: Let netcat's listen instance timeout after 10 seconds [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/592247 [08:49:21] (03CR) 10jerkins-bot: [V: 04-1] transfer.py: Let netcat's listen instance timeout after 10 seconds [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/592247 (owner: 10Jcrespo) [08:49:37] (03PS3) 10Jcrespo: transfer.py: Let netcat's listen instance timeout after 10 seconds [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/592247 [08:49:59] (03CR) 10jerkins-bot: [V: 04-1] transfer.py: Let netcat's listen instance timeout after 10 seconds [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/592247 (owner: 10Jcrespo) [08:52:07] (03PS4) 10Jcrespo: transfer.py: Let netcat's listen instance timeout after 10 seconds [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/592247 [09:01:13] (03PS1) 10Ayounsi: Chassis: more generic, add ae count [homer/public] - 10https://gerrit.wikimedia.org/r/592251 [09:02:15] (03Abandoned) 10Ayounsi: Add partial chassis support for asw and cr [homer/public] - 10https://gerrit.wikimedia.org/r/550389 (owner: 10Ayounsi) [09:09:46] PROBLEM - snapshot of s3 in codfw on db1115 is CRITICAL: snapshot for s3 at codfw taken more than 3 days ago: Most recent backup 2020-04-20 09:03:14 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [09:16:55] (03PS4) 10Ayounsi: WMF specific Netbox plugin for interfaces config [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/589406 (https://phabricator.wikimedia.org/T250429) [09:37:34] (03CR) 10Urbanecm: [C: 04-1] Set $wgArticleCount to 'any' on trwikisource (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) (owner: 10RhinosF1) [09:39:07] Urbanecm: heh, Why I did not see that even after I looked to try and work out the order I don't know, doing... [09:40:11] (03PS3) 10RhinosF1: Set $wgArticleCount to 'any' on trwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) [09:40:44] (03CR) 10RhinosF1: Set $wgArticleCount to 'any' on trwikisource (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) (owner: 10RhinosF1) [09:45:21] rxy: hi! The WMF is on holidays during the next days, so few people are around (we are available for emergencies) [09:45:24] anything going on? [09:45:36] situation is now off [09:45:40] sorry for calling. [09:46:01] nono thanks for checking the status of the websites :) [10:02:47] * RhinosF1 wonders if the wmf ever really goes on a proper holiday [10:20:24] RECOVERY - snapshot of s3 in eqiad on db1115 is OK: Last snapshot for s3 at eqiad (db1095.eqiad.wmnet:3313) taken on 2020-04-23 08:14:43 (857 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [11:47:52] (03PS1) 10Elukey: Refactor the exporter to support metrics specs via config file [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/592261 [11:48:07] self nerd snipe --^ [11:48:49] (03PS2) 10Elukey: Refactor the exporter to support metrics specs via config file [software/druid_exporter] - 10https://gerrit.wikimedia.org/r/592261 [12:15:52] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:23:12] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22745 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:36:02] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/transform/wikitext/to/html/{title} (Transform wikitext to html) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [12:37:48] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [13:04:12] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:11:34] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22731 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [14:49:00] RECOVERY - snapshot of s3 in codfw on db1115 is OK: Last snapshot for s3 at codfw (db2098.codfw.wmnet:3313) taken on 2020-04-23 11:36:52 (855 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [16:03:46] 10Operations, 10Services, 10Service-deployment-requests, 10artificial-intelligence: New Service Request 'open_nsfw' - https://phabricator.wikimedia.org/T250110 (10Chtnnh) Thanks so much @Lazy-restless 😄 [16:27:16] (03PS1) 10Andrew Bogott: Horizon: replace nutcracker with mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/592276 [16:30:30] (03CR) 10jerkins-bot: [V: 04-1] Horizon: replace nutcracker with mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/592276 (owner: 10Andrew Bogott) [16:32:41] (03PS2) 10Andrew Bogott: Horizon: replace nutcracker with mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/592276 [16:39:38] (03PS3) 10Andrew Bogott: Horizon: replace nutcracker with mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/592276 [16:45:30] (03PS4) 10Andrew Bogott: Horizon: replace nutcracker with mcrouter [puppet] - 10https://gerrit.wikimedia.org/r/592276 [16:49:03] (03CR) 10Andrew Bogott: "pcc results at https://puppet-compiler.wmflabs.org/compiler1001/22135/" [puppet] - 10https://gerrit.wikimedia.org/r/592276 (owner: 10Andrew Bogott) [18:24:50] PROBLEM - Host wikitech-static.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [18:28:30] RECOVERY - Host wikitech-static.wikimedia.org is UP: PING WARNING - Packet loss = 0%, RTA = 828.28 ms [19:21:07] (03CR) 10Urbanecm: [C: 03+1] "Thank you" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591771 (https://phabricator.wikimedia.org/T248747) (owner: 10RhinosF1) [19:23:19] (03CR) 10Urbanecm: [C: 04-1] VisualEditor: Allow external link paste on officewiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591416 (owner: 10Esanders) [19:34:08] Urbanecm: your welcome! [20:42:22] Hi, I'm an enwiki checkuser, and I need help from someone with database access. Is anyone around that I can PM and explain the situation? [20:42:29] ST47: PM me [22:06:38] !log Perform timeouting rename at enwiki Wikipedia talk:Introduction --> Wikipedia talk:Introduction (historical) using moveBatch.php ([[:meta:Special:Diff/20009402|request]]) [22:06:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:04] (03PS1) 10Zoranzoki21: Add two domains in wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592306 (https://phabricator.wikimedia.org/T205903) [23:03:53] (03PS2) 10Zoranzoki21: Add two domains in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592306 (https://phabricator.wikimedia.org/T205903) [23:06:39] (03PS3) 10Zoranzoki21: Add two domains in wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592306 (https://phabricator.wikimedia.org/T250903) [23:53:02] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [23:56:44] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid