[03:38:12] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [03:38:33] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [06:05:02] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:05:33] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [06:35:32] RECOVERY - MegaRAID on dbstore1002 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [08:06:43] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster={cache_text,cache_upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [08:07:23] PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job={varnish-text,varnish-upload} site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [08:07:43] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 124, down: 1, dormant: 0, excluded: 0, unused: 0 [08:08:03] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [08:08:53] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0 [08:09:12] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [08:10:43] RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [08:11:12] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [08:12:36] (03CR) 10Subramanya Sastry: Replace Tidy with RemexHtml everywhere (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442142 (https://phabricator.wikimedia.org/T175706) (owner: 10Subramanya Sastry) [08:16:53] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [08:17:53] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5 [08:25:42] PROBLEM - Check systemd state on rdb1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:30:11] (03CR) 10Subramanya Sastry: Replace Tidy with RemexHtml everywhere (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442142 (https://phabricator.wikimedia.org/T175706) (owner: 10Subramanya Sastry) [08:34:10] (03PS2) 10Subramanya Sastry: Replace Tidy with RemexHtml everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/442142 (https://phabricator.wikimedia.org/T175706) [08:37:52] PROBLEM - Check systemd state on rdb1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:42:32] PROBLEM - Check systemd state on rdb1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:50:24] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707 (10Marostegui) 05Open>03Resolved a:03Cmjohnson This is all good ``` root@dbstore1002:~# megacli -PDRbld -ShowProg -PhysDrv [32:5] -aALL Device(Encl-32 Slot-5) is not in reb... [08:51:22] RECOVERY - Check systemd state on rdb1004 is OK: OK - running: The system is fully operational [11:30:38] Hi. Can non-Ops push a patch for puppet repo and then deploy by a Ops? [11:35:02] Nope, only ops can push and deploy [11:36:15] Can you check T198371? [11:36:15] T198371: "Error 404 page not found" for zh-my localized URLs on zh.wikipedia.org - https://phabricator.wikimedia.org/T198371 [11:37:32] Looks like must modify the apache config file [11:38:10] I carnt, though ops are out for the weekend and will be back on Monday. Though some may be lurking. [11:39:40] How can I make ops aware of this task, such as adding tag? [11:40:03] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [11:40:30] the operations tag [11:40:53] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [11:41:51] thank [11:43:25] 10Operations, 10Wikimedia-General-or-Unknown, 10Chinese-Sites: "Error 404 page not found" for zh-my localized URLs on zh.wikipedia.org - https://phabricator.wikimedia.org/T198371 (10RazeSoldier) I add this tag, because this task is related to modifying the apache config file in puppet repo. [11:45:16] 10Operations, 10Chinese-Sites: "Error 404 page not found" for zh-my localized URLs on zh.wikipedia.org - https://phabricator.wikimedia.org/T198371 (10RazeSoldier) [11:47:32] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [11:47:52] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0 [11:58:10] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T197707 (10elukey) Thanks a lot @Marostegui! [12:04:25] (03PS1) 10Elukey: role::an_cluster::hadoop::master|slave: update max heap size threshold [puppet] - 10https://gerrit.wikimedia.org/r/443213 [12:08:55] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11622/" [puppet] - 10https://gerrit.wikimedia.org/r/443213 (owner: 10Elukey) [13:35:07] (03PS1) 10Urbanecm: Support zh-my localized URLs [puppet] - 10https://gerrit.wikimedia.org/r/443215 (https://phabricator.wikimedia.org/T198371) [13:36:29] (03PS2) 10Urbanecm: Support zh-my localized URLs [puppet] - 10https://gerrit.wikimedia.org/r/443215 (https://phabricator.wikimedia.org/T198371) [13:37:06] (03PS3) 10Urbanecm: Support zh-my localized URLs in Wikipedia virtualhost [puppet] - 10https://gerrit.wikimedia.org/r/443215 (https://phabricator.wikimedia.org/T198371) [14:29:08] (03PS4) 10Paladox: Add icinga2 [puppet] - 10https://gerrit.wikimedia.org/r/351540 [14:29:14] Hmm [16:19:38] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=codfw&var-cache_type=All&var-status_type=5 [16:19:39] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=misc&var-status_type=5 [16:22:06] graphite-labs, seems ok now [16:27:48] RECOVERY - NFS on labstore1006 is OK: TCP OK - 0.000 second response time on 208.80.154.7 port 2049 [16:28:04] !log labstore1006:~# service nfs-kernel-server restart [16:28:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:38] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=codfw&var-cache_type=All&var-status_type=5 [16:30:49] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=misc&var-status_type=5 [18:15:19] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [18:48:08] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [18:50:19] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [18:53:48] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [18:54:58] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [19:00:38] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [19:01:39] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [19:07:18] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [19:09:28] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [19:09:29] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [19:10:38] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [19:18:38] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [19:21:50] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [19:37:39] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [19:37:48] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [19:39:59] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [19:42:18] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [19:49:27] (03PS1) 10Alex Monk: Import .py from I832101d2 [software/certcentral] - 10https://gerrit.wikimedia.org/r/443234 [19:50:08] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [19:50:18] (03CR) 10Alex Monk: [C: 032] "initial commit" [software/certcentral] - 10https://gerrit.wikimedia.org/r/443234 (owner: 10Alex Monk) [19:51:06] (03CR) 10Alex Monk: [V: 032 C: 032] "old fashioned merging" [software/certcentral] - 10https://gerrit.wikimedia.org/r/443234 (owner: 10Alex Monk) [19:53:29] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [20:01:28] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:01:55] (03PS3) 10MacFan4000: Remove MW 1.29 from ExtDist as it is now no longer supported [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440745 [20:02:38] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [20:03:06] (03PS1) 10Alex Monk: Make pylint a little happier, add acme_tiny [software/certcentral] - 10https://gerrit.wikimedia.org/r/443239 [20:03:37] (03CR) 10Alex Monk: [V: 032 C: 032] Make pylint a little happier, add acme_tiny [software/certcentral] - 10https://gerrit.wikimedia.org/r/443239 (owner: 10Alex Monk) [20:04:04] (03CR) 10Alex Monk: "Moving part of this into operations/software/certcentral.git" [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [20:05:49] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:08:05] legoktm, about? [20:08:18] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [20:12:30] 10Operations, 10Traffic, 10Continuous-Integration-Config: Set up CI for new repo operations/software/certcentral.git - https://phabricator.wikimedia.org/T198541 (10Krenair) p:05Triage>03Normal [20:14:58] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:17:09] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [20:17:23] legoktm, I made https://phabricator.wikimedia.org/T198541 as I have no idea how that stuff works [20:17:36] 10Operations, 10Traffic, 10Continuous-Integration-Config: Set up CI for new repo operations/software/certcentral.git - https://phabricator.wikimedia.org/T198541 (10Krenair) p:05Normal>03Triage [20:30:29] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:34:59] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [20:39:29] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:39:29] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:40:48] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [20:45:18] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [20:47:28] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:48:29] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:49:39] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [20:52:07] 10Operations, 10Traffic, 10Continuous-Integration-Config: Set up CI for new repo operations/software/certcentral.git - https://phabricator.wikimedia.org/T198541 (10MarcoAurelio) I think we can use the `tox-docker` template but I reckon the repository needs a `tox.ini` file for this to work. [20:53:08] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [20:54:18] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [20:54:18] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [20:58:53] (03CR) 10ArielGlenn: [C: 032] writeuptopageid writes multiple output files [debs/mwbzutils] - 10https://gerrit.wikimedia.org/r/440839 (owner: 10ArielGlenn) [21:00:58] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:01:08] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [21:01:10] (03CR) 10ArielGlenn: [C: 032] generate temp stubs for page ranges serially from same input stub file [dumps] - 10https://gerrit.wikimedia.org/r/436956 (https://phabricator.wikimedia.org/T196063) (owner: 10ArielGlenn) [21:01:42] (03PS2) 10ArielGlenn: on dryrun, return the right number of results after (not) running command [dumps] - 10https://gerrit.wikimedia.org/r/440846 [21:02:27] (03CR) 10ArielGlenn: [C: 032] on dryrun, return the right number of results after (not) running command [dumps] - 10https://gerrit.wikimedia.org/r/440846 (owner: 10ArielGlenn) [21:03:18] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [21:05:28] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [21:05:45] (03PS1) 10Alex Monk: Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [21:06:29] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:08:49] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:08:58] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [21:09:59] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [21:21:39] (03PS2) 10Alex Monk: Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [21:22:18] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:24:38] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [21:25:48] (03PS3) 10ArielGlenn: generate multiple temp stub files at once for larger wikis [dumps] - 10https://gerrit.wikimedia.org/r/442828 (https://phabricator.wikimedia.org/T196063) [21:28:27] (03PS3) 10MarcoAurelio: Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [21:28:35] (03CR) 10MarcoAurelio: "recheck" [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [21:29:42] (03CR) 10ArielGlenn: [C: 032] generate multiple temp stub files at once for larger wikis [dumps] - 10https://gerrit.wikimedia.org/r/442828 (https://phabricator.wikimedia.org/T196063) (owner: 10ArielGlenn) [21:29:56] (03PS4) 10ArielGlenn: generate multiple temp stub files at once for larger wikis [dumps] - 10https://gerrit.wikimedia.org/r/442828 (https://phabricator.wikimedia.org/T196063) [21:31:18] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:32:27] (03PS4) 10MarcoAurelio: Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [21:32:44] !log ariel@deploy1001 Started deploy [dumps/dumps@a1bc510]: generate temp stubs smarter, T196063 [21:32:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:47] T196063: Be smart about creation of temp stub files for the corresponding page output content - https://phabricator.wikimedia.org/T196063 [21:35:11] !log ariel@deploy1001 Finished deploy [dumps/dumps@a1bc510]: generate temp stubs smarter, T196063 (duration: 02m 27s) [21:35:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:37:07] (03PS1) 10ArielGlenn: snapshot1001 is gone, no longer a scap target for dumps! [dumps/scap] - 10https://gerrit.wikimedia.org/r/443293 [21:37:59] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [21:40:12] (03CR) 10ArielGlenn: [V: 032 C: 032] snapshot1001 is gone, no longer a scap target for dumps! [dumps/scap] - 10https://gerrit.wikimedia.org/r/443293 (owner: 10ArielGlenn) [21:43:38] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:45:49] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [21:49:18] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:51:00] (03CR) 10Hashar: "recheck" [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [21:52:48] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [21:52:59] PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert. [21:53:12] (03CR) 10Hashar: "recheck" [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [21:53:49] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:54:09] RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting. [21:54:49] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [21:56:08] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [21:58:28] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [21:58:28] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [21:59:51] (03CR) 10MarcoAurelio: "@hashar: all those failures are related to this repo as-is now? Thanks." [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [22:01:48] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [22:03:38] Krenair: nope [22:04:08] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [22:07:38] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [22:07:52] 10Operations, 10Traffic, 10Continuous-Integration-Config, 10Patch-For-Review: Set up CI for new repo operations/software/certcentral.git - https://phabricator.wikimedia.org/T198541 (10MarcoAurelio) So the CI is setup, but there are some errors that needs to get fixed or ignored. Unfortunatelly, I cannot he... [22:08:39] RECOVERY - recommendation_api endpoints health on scb2004 is OK: All endpoints are healthy [22:24:08] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [22:30:20] (03PS5) 10Alex Monk: Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [22:30:47] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [22:38:31] (03PS6) 10Alex Monk: Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [22:38:57] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [22:39:18] 10Operations, 10Commons, 10Wikimedia-Site-requests: Please upload large file to Wikimedia Commons - https://phabricator.wikimedia.org/T192751 (10Urbanecm) [22:39:46] 10Operations, 10Traffic, 10Patch-For-Review: Create and deploy a centralized letsencrypt service - https://phabricator.wikimedia.org/T194962 (10Krenair) [22:39:51] 10Operations, 10Traffic, 10Continuous-Integration-Config, 10Patch-For-Review, 10User-MarcoAurelio: Set up CI for new repo operations/software/certcentral.git - https://phabricator.wikimedia.org/T198541 (10Krenair) 05Open>03Resolved a:05Krenair>03MarcoAurelio thanks guys, this looks ok [22:40:19] Can somebody handle T192751 (and possibly T198543 too) please? [22:40:19] T198543: Please upload large file to Wikimedia Commons - https://phabricator.wikimedia.org/T198543 [22:40:20] T192751: Please upload large file to Wikimedia Commons - https://phabricator.wikimedia.org/T192751 [22:57:44] (03PS7) 10Alex Monk: Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [22:58:10] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:03:08] PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: instance=kubernetes1001.eqiad.wmnet operation_type={create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:03:18] PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet operation_type={container_status,create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:04:18] RECOVERY - kubelet operational latencies on kubernetes1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:04:28] RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [23:09:08] (03PS8) 10Alex Monk: Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [23:09:35] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:12:26] (03PS9) 10Alex Monk: Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [23:12:51] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:13:37] (03PS10) 10Alex Monk: Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [23:14:05] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:28:18] (03PS11) 10Alex Monk: Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [23:28:46] (03CR) 10jerkins-bot: [V: 04-1] Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:33:54] (03PS12) 10Alex Monk: Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) [23:35:38] (03CR) 10Alex Monk: [C: 032] Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:36:10] (03Merged) 10jenkins-bot: Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:36:34] (03CR) 10jenkins-bot: Add a tox.ini and do most cleanup to try to make this compliant with flake8 and pylint [software/certcentral] - 10https://gerrit.wikimedia.org/r/443291 (https://phabricator.wikimedia.org/T198541) (owner: 10Alex Monk) [23:40:14] (03CR) 10Legoktm: "Please add a license indicator to this file and the repository itself." [software/certcentral] - 10https://gerrit.wikimedia.org/r/443234 (owner: 10Alex Monk) [23:44:06] legoktm, what's the ops/puppet license? [23:44:23] it doesn't have one [23:44:30] there's a bug for that [23:44:38] lol [23:44:44] in general most projects should default to GPL-2/3-or-later [23:45:20] Since I wrote this code I can license it [23:45:44] I can't license acme_tiny but that has a great big header at the top already [23:46:18] yes that's fine [23:53:10] (03PS1) 10Alex Monk: license [software/certcentral] - 10https://gerrit.wikimedia.org/r/443297 [23:54:07] (03CR) 10Alex Monk: [C: 032] license [software/certcentral] - 10https://gerrit.wikimedia.org/r/443297 (owner: 10Alex Monk) [23:54:10] legoktm, ^ [23:54:29] (03Merged) 10jenkins-bot: license [software/certcentral] - 10https://gerrit.wikimedia.org/r/443297 (owner: 10Alex Monk) [23:55:10] Krenair: I think you can write better commit messages than that [23:55:11] (03CR) 10jenkins-bot: license [software/certcentral] - 10https://gerrit.wikimedia.org/r/443297 (owner: 10Alex Monk) [23:55:14] but thank you [23:58:12] I can [23:58:29] but I don't have to get the git commit history of this thing through operations/puppet review [23:58:38] just the actual source code at the end