[00:01:37] RECOVERY - Analytics Cassanda CQL query interface on aqs1002 is OK: TCP OK - 0.005 second response time on port 9042 [00:04:47] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 6 below the confidence bounds [00:06:27] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [00:11:27] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 6 below the confidence bounds [00:16:27] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [02:15:09] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [02:16:49] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [02:24:59] !log l10nupdate@tin Synchronized php-1.27.0-wmf.2/cache/l10n: l10nupdate for 1.27.0-wmf.2 (duration: 08m 25s) [02:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:29:50] !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.2) at 2015-10-18 02:29:50+00:00 [02:29:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:52:58] !log l10nupdate@tin Synchronized php-1.27.0-wmf.3/cache/l10n: l10nupdate for 1.27.0-wmf.3 (duration: 08m 31s) [02:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:57:48] !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.3) at 2015-10-18 02:57:48+00:00 [02:57:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:52:17] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [04:04:57] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [04:16:47] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [04:17:59] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [04:19:29] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [04:19:38] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [04:20:08] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [04:26:57] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [04:30:19] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [04:55:28] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [05:00:38] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 3 below the confidence bounds [05:10:38] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [05:20:37] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [05:27:18] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 6 below the confidence bounds [05:30:48] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 3 below the confidence bounds [05:35:48] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 3 below the confidence bounds [05:55:48] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [06:00:58] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [06:11:37] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Oct 18 06:11:37 UTC 2015 (duration 11m 36s) [06:11:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:21:08] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [06:30:28] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:28] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:38] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:48] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:49] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:37] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:48] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:59] PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:28] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:29] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:48] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:38] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [06:52:29] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [06:54:58] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [06:56:07] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:56:08] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:56:57] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:07] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:57:08] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:57:08] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:57:18] RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:28] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:57:29] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:38] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:08] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:49:37] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [07:51:17] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [09:48:49] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add initial Debian package for apertium-is-sv [debs/contenttranslation/apertium-is-sv] - 10https://gerrit.wikimedia.org/r/244405 (https://phabricator.wikimedia.org/T111902) (owner: 10KartikMistry) [09:50:28] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added initial Debian package for apertium-es-ro [debs/contenttranslation/apertium-es-ro] - 10https://gerrit.wikimedia.org/r/244183 (https://phabricator.wikimedia.org/T111902) (owner: 10KartikMistry) [09:51:33] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Added initial Debian package for apertium-es-it [debs/contenttranslation/apertium-es-it] - 10https://gerrit.wikimedia.org/r/244170 (https://phabricator.wikimedia.org/T111902) (owner: 10KartikMistry) [09:52:38] 6operations: Create an upload queue for reprepro - https://phabricator.wikimedia.org/T115349#1733149 (10revi) [09:59:08] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: puppet fail [10:04:31] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-es-it_0.1.0~r51165-1 [10:04:32] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-es-ro_0.7.3~r57551-1 [10:04:32] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-is-sv_0.1.0~r56030-1 [10:04:32] !log uploaded to apt.wikimedia.org trusty-wikimedia: apertium-mlt-ara_0.1.0~r57554-1 [10:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:04:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:12:05] 6operations, 10Gerrit: Rename "Dzahn" to "Daniel Zahn" in Gerrit - https://phabricator.wikimedia.org/T113792#1733161 (10QChris) > But only if i can also change the entire history, [...] If by 'history', you mean the Author and Committer fields in git, then you're basically out of luck. Git (not Gerrit) is usi... [10:13:23] (03PS1) 10Alexandros Kosiaris: package_builder: Keep environments updated [puppet] - 10https://gerrit.wikimedia.org/r/247084 [10:27:39] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:44:57] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [11:46:29] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [12:08:11] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [12:19:57] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 228, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps DWDM]BR [12:23:09] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [12:31:40] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [12:32:08] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [12:35:28] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [13:07:57] PROBLEM - Host cp1059 is DOWN: PING CRITICAL - Packet loss = 100% [13:13:04] the eqiad-codfw alert is real but it's planned, don't worry about it [13:13:15] as long as the eqord links are ok, there's no problem [13:13:27] (03PS1) 10Luke081515: Enable four new namespaces at thwikitionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247088 (https://phabricator.wikimedia.org/T114458) [14:01:29] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [14:03:09] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [14:29:09] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: puppet fail [14:30:29] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: puppet fail [14:57:18] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:57:49] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:58:09] 6operations, 10ops-eqiad, 5Patch-For-Review: cp1059 has network issues - https://phabricator.wikimedia.org/T114870#1733388 (10BBlack) Downtime expired today, so I re-upped it until Oct 28th. [15:21:19] (03PS1) 10Luke081515: Rename two namespaces at bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247093 (https://phabricator.wikimedia.org/T115812) [15:23:28] Hello, I got two new changes at gerrit, which change InitialiseSerttings.php. Is there someone who can take a look? [15:24:32] It is nothing big, the first change got 4, the other 2 lines added [15:27:42] Luke081515: please see https://wikitech.wikimedia.org/wiki/SWAT_deploys [15:28:08] (03CR) 10Glaisher: "$wgNamespaceAliases is not used for renaming (but should be used for backwards compatibility). See documentation. For renaming project tal" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247093 (https://phabricator.wikimedia.org/T115812) (owner: 10Luke081515) [15:32:20] (03CR) 10Glaisher: [C: 04-1] Rename two namespaces at bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247093 (https://phabricator.wikimedia.org/T115812) (owner: 10Luke081515) [15:37:52] (03CR) 10Glaisher: [C: 04-1] "Looks like you're using $wgNamespaceAliases here. Use $wgExtraNamespaces instead. You might want to read the documentation for config vari" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247088 (https://phabricator.wikimedia.org/T114458) (owner: 10Luke081515) [15:40:12] (03CR) 10Glaisher: Enable four new namespaces at thwikitionary (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247088 (https://phabricator.wikimedia.org/T114458) (owner: 10Luke081515) [15:48:17] (03CR) 10Glaisher: [C: 031] Enable WikidataPageBanner on fr.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/246169 (https://phabricator.wikimedia.org/T115023) (owner: 10Dereckson) [15:51:11] (03PS2) 10Luke081515: Enable four new namespaces at thwikitionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247088 (https://phabricator.wikimedia.org/T114458) [15:53:07] (03CR) 10Glaisher: "This can probably be renamed to wg.. and removed from CommonSettings.php now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/246703 (owner: 10Bartosz Dziewoński) [15:58:08] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: puppet fail [15:58:17] (03CR) 10Glaisher: "Note: ProofreadPage patch has been merged and would reach these wikis this week." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240640 (https://phabricator.wikimedia.org/T54709) (owner: 10Glaisher) [16:09:10] (03CR) 10Paladox: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/244739 (owner: 10Paladox) [16:24:28] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [16:25:27] (03CR) 10Glaisher: [C: 04-1] "Yeah, it might make sense to split this up." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240065 (https://phabricator.wikimedia.org/T104251) (owner: 10Mdann52) [16:26:49] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:27:57] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [17:27:50] 6operations, 6Release-Engineering-Team: Monitor Phabricator and Gerrit availability - https://phabricator.wikimedia.org/T115611#1733441 (10MZMcBride) >>! In T115611#1730798, @greg wrote: > Are these enough? They seem like they give us "monitoring of errors and latency". I don't know much about the current mon... [17:41:18] PROBLEM - YARN NodeManager Node-State on analytics1038 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:42:57] RECOVERY - YARN NodeManager Node-State on analytics1038 is OK: OK: YARN NodeManager analytics1038.eqiad.wmnet:8041 Node-State: RUNNING [17:44:07] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection timed out [17:49:07] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.001 second response time on port 9042 [17:54:18] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection timed out [17:59:18] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.001 second response time on port 9042 [18:04:39] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection timed out [18:07:58] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.001 second response time on port 9042 [18:16:38] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection timed out [18:23:18] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.001 second response time on port 9042 [18:43:57] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection timed out [18:48:19] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: puppet fail [18:53:58] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.999 second response time on port 9042 [18:59:19] PROBLEM - Analytics Cassanda CQL query interface on aqs1003 is CRITICAL: Connection timed out [19:04:18] PROBLEM - Analytics Cassandra database on aqs1003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (cassandra), command name java, args CassandraDaemon [19:07:27] PROBLEM - puppet last run on dbstore2002 is CRITICAL: CRITICAL: puppet fail [19:12:48] RECOVERY - Analytics Cassandra database on aqs1003 is OK: PROCS OK: 1 process with UID = 113 (cassandra), command name java, args CassandraDaemon [19:15:38] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:17:48] RECOVERY - Analytics Cassanda CQL query interface on aqs1003 is OK: TCP OK - 0.000 second response time on port 9042 [19:25:11] 6operations, 10Traffic, 5Patch-For-Review, 7Pybal: pybal fails to detect dead servers under production lb IPs for port 80 - https://phabricator.wikimedia.org/T113151#1733478 (10mmodell) So even though the other end closed the connection properly, pybal doesn't find out until the keepalive timeout elapses?... [19:34:27] RECOVERY - puppet last run on dbstore2002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [19:38:05] (03CR) 10TTO: [C: 04-1] Enable four new namespaces at thwikitionary (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247088 (https://phabricator.wikimedia.org/T114458) (owner: 10Luke081515) [19:41:47] (03PS3) 10Luke081515: Enable four new namespaces at thwikitionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247088 (https://phabricator.wikimedia.org/T114458) [19:42:00] tto: Now better? [19:43:14] Luke081515: no, it should be id => name not name => id [19:43:22] you need to invert how you've set it out [19:44:18] oh [19:44:44] at first I used another variable, so that's a simple c&p error :-/ [19:45:10] (03CR) 10John F. Lewis: [C: 04-1] "ack." [puppet] - 10https://gerrit.wikimedia.org/r/244814 (owner: 10John F. Lewis) [19:46:18] (03PS4) 10Luke081515: Enable four new namespaces at thwikitionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/247088 (https://phabricator.wikimedia.org/T114458) [19:46:42] JohnFLewis: Should be fixed now [19:47:11] looks fine [19:47:18] * JohnFLewis isn't going to review it though [20:52:31] 6operations, 10Wikimedia-General-or-Unknown, 7Database: hewiki's categorylinks shown as not empty though it is; purging does not help - https://phabricator.wikimedia.org/T115682#1733589 (10eranroz) Referring to LinksDeletionUpdate.php - are there cleanup triggers configurated on hewiki? are they enabled? [21:02:17] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [21:02:42] 6operations, 10Wikimedia-General-or-Unknown, 7Database: hewiki's categorylinks shown as not empty though it is; purging does not help - https://phabricator.wikimedia.org/T115682#1733591 (10Reedy) It's empty now. And purging it wouldn't help ever; you'd need to purge every file shown in it [21:03:57] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [21:08:15] 6operations, 10Wikimedia-General-or-Unknown, 7Database: hewiki's categorylinks shown as not empty though it is; purging does not help - https://phabricator.wikimedia.org/T115682#1733592 (10eranroz) Categorylinks isn't empty. {{PAGESINCATEGORY:ויקיפדיה: למחיקה מהירה}} return 2 select * from categorylinks wher... [21:23:33] 6operations, 10Wikimedia-General-or-Unknown, 7Database: hewiki's categorylinks shown as not empty though it is; purging does not help - https://phabricator.wikimedia.org/T115682#1733594 (10Reedy) The hewiki master shows 7 ``` mysql:wikiadmin@db1062 [hewiki]> select * from categorylinks where cl_to like '%למ... [21:26:07] 6operations, 10Wikimedia-General-or-Unknown, 7Database: hewiki's categorylinks shown as not empty though it is; purging does not help - https://phabricator.wikimedia.org/T115682#1733597 (10eranroz) Most or all of teh cl_from are deleted pages. e.g: ``` select * from categorylinks left join page on cl_from=pa... [22:03:59] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [22:30:37] 6operations, 6Release-Engineering-Team: Monitor Phabricator and Gerrit availability - https://phabricator.wikimedia.org/T115611#1733626 (10greg) Yes, icinga announces in IRC when either of those two things listed above (the incinga links) fails. [22:32:19] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:11:17] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 230, down: 0, dormant: 0, excluded: 0, unused: 0 [23:23:50] 6operations, 10Wikimedia-General-or-Unknown, 7Database: hewiki's categorylinks shown as not empty though it is; purging does not help - https://phabricator.wikimedia.org/T115682#1733656 (10aaron) Probably a duplicate of T115586 [23:36:39] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [23:38:19] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212