[00:10:49] 10Operations, 10Graphite: Include ADD operation in memcached stats and grafana dashboard - https://phabricator.wikimedia.org/T201016 (10aaron) I noticed that regular memcached counts ADD as it does SET (cmd_set). There is no cmd_add. However, mcrouter does seem to expose a cmd_add counter. Perhaps there can be... [00:11:52] (03CR) 10Dzahn: "just don't use $title as seed when having 2 crons in the same class as shown by http://puppet-compiler.wmflabs.org/11985/netmon1002.wikime" [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) (owner: 10Dzahn) [00:16:36] (03PS5) 10Dzahn: postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) [00:17:17] (03CR) 10jerkins-bot: [V: 04-1] postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) (owner: 10Dzahn) [00:37:33] (03PS6) 10Dzahn: postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) [00:38:07] (03CR) 10jerkins-bot: [V: 04-1] postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) (owner: 10Dzahn) [00:38:17] (03PS7) 10Dzahn: postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) [00:38:55] (03CR) 10jerkins-bot: [V: 04-1] postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) (owner: 10Dzahn) [00:40:47] (03PS8) 10Dzahn: postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) [00:42:24] (03CR) 10Dzahn: [C: 032] postgresql::backup: don't run both crons at same minute [puppet] - 10https://gerrit.wikimedia.org/r/450257 (https://phabricator.wikimedia.org/T190184) (owner: 10Dzahn) [00:50:25] 10Operations, 10Patch-For-Review: Netbox: setup backups - https://phabricator.wikimedia.org/T190184 (10Dzahn) Finally there should be no more cronspam :p ... and we have actual files: ``` root@netmon1002:/srv/postgres-backup# ls psql-all-dbs-20180803.sql.gz root@netmon2001:/srv/postgres-backup# ls psql-all-... [00:52:03] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10JKatzWMF) I am really excited to see this ticket in play. I also want to double-... [01:47:03] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Legoktm) Ummm, this task is about fixing the Italian Wikipedia's search engine pr... [02:54:25] 10Operations, 10netops: asw2-a-eqiad FPC5 gets disconnected every 10 minutes - https://phabricator.wikimedia.org/T201145 (10ayounsi) Regular member restart didn't help. Juniper's tftp doesn't seem to support large files. So the USB method is the only option. asw2-a5 is on the loader> prompt and stable. Doesn... [03:06:18] (03PS2) 10Dzahn: postgresql::dump: unify commands in a single cron job [puppet] - 10https://gerrit.wikimedia.org/r/450261 [03:06:22] (03Abandoned) 10Dzahn: postgresql::dump: unify commands in a single cron job [puppet] - 10https://gerrit.wikimedia.org/r/450261 (owner: 10Dzahn) [03:11:34] (03PS1) 10Dzahn: mariadb: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/450314 [03:12:31] (03PS1) 10Dzahn: dnsrecursor: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/450315 [03:13:32] (03PS1) 10Dzahn: postgres::master: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/450316 [03:15:14] (03PS1) 10Dzahn: eventbus: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/450317 [03:17:00] (03PS1) 10Dzahn: eventlogging/kafka::analytics: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/450318 [03:18:11] (03PS1) 10Dzahn: memcached: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/450319 [04:40:05] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [05:21:24] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10JKatzWMF) @Legoktm I don't mean to get in the way of fixing the specific issue wi... [06:16:55] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [07:19:52] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown, 10SEO: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10ArielGlenn) When i chatted with @Imarlier via Hangout, we talked about running t... [08:07:23] 10Operations, 10SRE-Access-Requests: analytics-privatedata-users access for Dario Rossi (username drossi) - https://phabricator.wikimedia.org/T201196 (10Rossi.dario.g) Hello, Probably wrong understanding on my side: I though this was a name for a new wikitech account creation (I think I.don't have one, and su... [08:22:47] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10Aklapper) [09:54:53] (03PS1) 10Framawiki: Add media.farsnews.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/450391 (https://phabricator.wikimedia.org/T200872) [10:11:55] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 26 probes of 310 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [10:17:05] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 14 probes of 310 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [13:35:21] (03PS1) 10Ladsgroup: wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) [13:35:57] (03CR) 10jerkins-bot: [V: 04-1] wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) (owner: 10Ladsgroup) [13:37:32] (03PS2) 10Ladsgroup: wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) [13:38:03] (03CR) 10jerkins-bot: [V: 04-1] wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) (owner: 10Ladsgroup) [13:43:07] (03PS3) 10Ladsgroup: wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) [13:43:43] (03CR) 10jerkins-bot: [V: 04-1] wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) (owner: 10Ladsgroup) [13:47:15] PROBLEM - Apache HTTP on mw1279 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:48:16] RECOVERY - Apache HTTP on mw1279 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.023 second response time [14:02:28] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10Glrx) [14:50:30] !log roll-restart logstash because of increasing packet loss [14:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:09] * Krinkle staging on mwdebug1002/deploy1001 [16:37:32] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.15/includes/Storage/DerivedPageDataUpdater.php: I5cb93c173 [1/2] (duration: 01m 03s) [16:37:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:55] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.15/includes/resourceloader/ResourceLoaderWikiModule.php: I5cb93c173 (duration: 00m 52s) [16:40:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:03:50] (03PS1) 10Rxy: Add suppressredirect permission to rollbacker and patroller at zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/450401 (https://phabricator.wikimedia.org/T201160) [19:46:06] PROBLEM - HP RAID on db2054 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:7 - Controller: OK - Battery/Capacitor: OK [19:46:08] ACKNOWLEDGEMENT - HP RAID on db2054 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:7 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T201245 [19:46:12] 10Operations, 10ops-codfw: Degraded RAID on db2054 - https://phabricator.wikimedia.org/T201245 (10ops-monitoring-bot) [20:08:16] PROBLEM - Device not healthy -SMART- on db2054 is CRITICAL: cluster=mysql device=cciss,6 instance=db2054:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2054&var-datasource=codfw%2520prometheus%252Fops [20:10:06] (03PS4) 10Ladsgroup: wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) [20:10:51] (03CR) 10jerkins-bot: [V: 04-1] wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) (owner: 10Ladsgroup) [20:12:17] (03PS5) 10Ladsgroup: wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) [20:12:48] (03CR) 10jerkins-bot: [V: 04-1] wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) (owner: 10Ladsgroup) [20:15:19] (03PS6) 10Ladsgroup: wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) [20:15:49] (03CR) 10jerkins-bot: [V: 04-1] wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) (owner: 10Ladsgroup) [20:22:13] (03PS7) 10Ladsgroup: wikilabels: Enforce SSL [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) [20:23:23] (03CR) 10Ladsgroup: "I'm stupidest man alive, forgot the comma and was fixing other things...." [puppet] - 10https://gerrit.wikimedia.org/r/450395 (https://phabricator.wikimedia.org/T184437) (owner: 10Ladsgroup) [20:30:07] PROBLEM - puppet last run on labvirt1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/mysql-ps1.sh] [20:35:17] RECOVERY - puppet last run on labvirt1008 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:35:35] PROBLEM - HHVM rendering on mw1256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:36:35] RECOVERY - HHVM rendering on mw1256 is OK: HTTP OK: HTTP/1.1 200 OK - 75247 bytes in 0.102 second response time