[00:00:00] New patchset: Nemo bis; "(bug 46589) Add localised/v2 logos for Wikipedias without one (second installment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56084 [00:08:56] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 00:08:47 UTC 2013 [00:08:56] RECOVERY - Puppet freshness on db1052 is OK: puppet ran at Wed Mar 27 00:08:51 UTC 2013 [00:09:45] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:10:24] RECOVERY - Puppet freshness on mw13 is OK: puppet ran at Wed Mar 27 00:10:21 UTC 2013 [00:10:54] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 00:10:46 UTC 2013 [00:11:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:12:44] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 00:12:35 UTC 2013 [00:12:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:14:34] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 00:14:30 UTC 2013 [00:14:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [00:20:05] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [00:27:37] New review: Asher; "(2 comments)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52606 [00:29:45] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56081 [00:30:54] !log asher synchronized wmf-config/db-eqiad.php 'pulling db1001, adding db1052 at a warmup weight' [00:31:01] Logged the message, Master [00:34:24] !log asher synchronized wmf-config/db-eqiad.php 'db1052 to full weight' [00:34:31] Logged the message, Master [00:39:51] PROBLEM - RAID on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:21] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:21] PROBLEM - swift-account-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:31] PROBLEM - SSH on ms-be3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:41:31] PROBLEM - swift-container-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:31] PROBLEM - swift-container-updater on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:31] PROBLEM - DPKG on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:31] PROBLEM - swift-object-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:32] PROBLEM - swift-object-updater on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:32] PROBLEM - swift-account-reaper on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:51] PROBLEM - swift-object-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:41:51] PROBLEM - swift-account-replicator on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:01] PROBLEM - swift-container-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:11] PROBLEM - swift-account-server on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:42:11] PROBLEM - swift-object-auditor on ms-be3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:52:35] grrr i look at ms-be3 [00:53:16] yep, looks ike swift crashed [00:54:02] !log rebooting ms-be3 [00:54:09] Logged the message, Mistress of the network gear. [00:54:31] PROBLEM - NTP on ms-be3 is CRITICAL: NTP CRITICAL: No response from NTP server [00:55:21] !log ms-be3 was crashed with stack traces for swift-container [00:55:27] Logged the message, Mistress of the network gear. [00:56:41] RECOVERY - swift-account-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [00:56:41] RECOVERY - swift-object-server on ms-be3 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [00:56:51] RECOVERY - swift-container-server on ms-be3 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [00:57:02] RECOVERY - swift-account-server on ms-be3 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [00:57:12] RECOVERY - swift-object-auditor on ms-be3 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [00:57:12] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [00:57:12] RECOVERY - swift-account-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [00:57:21] RECOVERY - SSH on ms-be3 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:57:21] RECOVERY - swift-container-updater on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [00:57:21] RECOVERY - swift-container-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [00:57:21] RECOVERY - DPKG on ms-be3 is OK: All packages OK [00:57:31] RECOVERY - swift-object-updater on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [00:57:31] RECOVERY - swift-account-reaper on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [00:57:31] RECOVERY - swift-object-replicator on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [01:06:19] New patchset: Nemo bis; "(bug 46589) Add localised/v2 logos for Wikipedias without one (second installment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56084 [01:08:04] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [01:10:14] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [01:14:11] New patchset: Nemo bis; "(bug 46589) Add localised/v2 logos for Wikipedias without one (second installment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56084 [01:16:00] Change abandoned: Nemo bis; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56084 [01:31:16] New patchset: Nemo bis; "(bug 46589) Add localised/v2 logos for Wikipedias without one (second installment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56097 [01:36:10] New patchset: Odder; "Add tz database time zone settings for wikis in Maldivian language Adding tz database time zone settings for dv.wikipedia and dv.wiktionary ('Indian/Maldives' = UTC+5:00). Bug: 46351" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56098 [01:48:46] New patchset: Odder; "Add tz database time zone settings for wikis in Maldivian language. Removing unnecessary comma from the end of the line Bug: 46351" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56098 [02:04:57] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [02:07:08] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [02:08:37] PROBLEM - Puppet freshness on mw1099 is CRITICAL: Puppet has not run in the last 10 hours [02:09:39] PROBLEM - Puppet freshness on mw1025 is CRITICAL: Puppet has not run in the last 10 hours [02:09:39] PROBLEM - Puppet freshness on mw1121 is CRITICAL: Puppet has not run in the last 10 hours [02:10:01] New review: Odder; "All the new links are working; the files have also been protected on Commons against moves and reupl..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/56097 [02:10:37] PROBLEM - Puppet freshness on mw1130 is CRITICAL: Puppet has not run in the last 10 hours [02:14:37] PROBLEM - Puppet freshness on mw1120 is CRITICAL: Puppet has not run in the last 10 hours [02:15:37] PROBLEM - Puppet freshness on mw1160 is CRITICAL: Puppet has not run in the last 10 hours [02:19:11] !log LocalisationUpdate completed (1.21wmf12) at Wed Mar 27 02:19:11 UTC 2013 [02:19:18] Logged the message, Master [02:23:39] PROBLEM - Puppet freshness on mw1001 is CRITICAL: Puppet has not run in the last 10 hours [02:44:56] !log LocalisationUpdate completed (1.21wmf11) at Wed Mar 27 02:44:56 UTC 2013 [02:45:02] Logged the message, Master [02:47:22] wikibugs has gone quiet. [02:47:37] And there are reports of Gerrit e-mail strangeness. [02:50:01] Hmm, gerrit-wm seems fine in #mediawiki. [03:03:27] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [03:05:37] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [03:20:35] New patchset: Odder; "(bug 45066) Disable anonymous page creation at tr.wikipedia Disabling page creation for anonymous users on the Turkish Wikipedia per community consensus. Bug: 45066" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56101 [03:42:32] New review: Odder; "Please set $wgAutoConfirmAge = 345600 per https://bugzilla.wikimedia.org/show_bug.cgi?id=44285#c3 an..." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/56055 [04:06:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:07:54] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [04:08:04] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 04:07:55 UTC 2013 [04:08:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:09:15] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 04:09:11 UTC 2013 [04:09:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:10:24] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 04:10:23 UTC 2013 [04:10:47] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:11:24] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 04:11:20 UTC 2013 [04:11:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:12:14] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 04:12:13 UTC 2013 [04:12:44] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:13:04] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 04:12:58 UTC 2013 [04:13:45] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:16:44] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 04:16:43 UTC 2013 [04:17:45] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [04:51:07] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [04:55:33] !log purging BayesLearning files older than 70 days on mchenry [04:55:39] Logged the message, Master [04:56:07] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [05:01:57] !log disabling Bayes-collector cron on sanger [05:02:03] Logged the message, Master [05:15:08] New patchset: Tim Starling; "Add a cron job to clean up old MW logs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55003 [05:19:59] !log on fluorine: running mw-log-cleanup once to test Id2196e7b [05:20:05] Logged the message, Master [05:25:01] New patchset: Tim Starling; "Add a cron job to clean up old MW logs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55003 [05:26:53] New review: Tim Starling; "PS3: set executable bit on script in git" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/55003 [05:26:55] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55003 [05:33:49] New patchset: Tim Starling; "Maybe not once per minute from 02:00 to 02:59" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56103 [05:35:07] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56103 [06:04:17] New patchset: Tim Starling; "Move scap source location from fenari to tin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56104 [06:04:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:06:58] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [06:10:46] New patchset: Tim Starling; "In sync-dir, actually perform the syntax check" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56105 [06:31:30] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 06:31:24 UTC 2013 [06:31:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [06:49:45] New patchset: Tim Starling; "Basic puppetization of dsh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56107 [06:49:47] New patchset: Tim Starling; "Remove some node lists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56108 [07:04:21] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [07:06:31] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [07:10:41] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [07:14:41] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 07:14:31 UTC 2013 [07:15:21] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:06:38] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:07:48] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 08:07:42 UTC 2013 [08:08:38] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:08:49] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [08:10:48] PROBLEM - Puppet freshness on mw1095 is CRITICAL: Puppet has not run in the last 10 hours [08:14:38] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 08:14:30 UTC 2013 [08:14:38] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [08:23:48] PROBLEM - Puppet freshness on mw62 is CRITICAL: Puppet has not run in the last 10 hours [08:24:48] PROBLEM - Puppet freshness on mw1135 is CRITICAL: Puppet has not run in the last 10 hours [08:35:07] PROBLEM - Puppet freshness on mw51 is CRITICAL: Puppet has not run in the last 10 hours [09:05:12] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [09:06:22] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [09:31:46] New patchset: Odder; "(bug 43863) Enabled wgImportSources on the Spanish Wikivoyage. Added eswiki, meta, commons, en.voy, de.voy, fr.voy, it.voy, nl.voy, pt.voy, ru.voy, and sv.voy" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56113 [09:36:59] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [09:36:59] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [09:36:59] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [09:54:40] wb Nikerabbit [09:55:50] Nemo_bis: uga [09:56:17] !log Testing Translate bug fixes on test.wikipedia.org [09:56:25] Logged the message, Master [10:05:07] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [10:07:18] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [10:08:17] PROBLEM - Puppet freshness on cp3010 is CRITICAL: Puppet has not run in the last 10 hours [10:14:03] New review: Nemo bis; "Leslie, what's stopping this? Apart from moving to hume as you said, do I need to do something else?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [10:16:18] Nikerabbit: wikiapiary compensates the lack of WMF monitoring in part, though http://wikiapiary.com/wiki/Wikipedia_Test_Wiki [10:17:17] with low resolution though [10:20:17] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [10:31:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:32:04] New patchset: Odder; "(bug 45638) Modify user group rights on it.wikivoyage Modified wgAddGroups and wgRemoveGroups; changed user rights for autoconfirmed, added patroller group." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56118 [10:32:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [11:04:27] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [11:06:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:06:38] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [11:07:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [11:13:59] ugh [11:14:10] change propagation lag on wikidata is rising fast [11:14:33] anyone around to approve and deploy https://gerrit.wikimedia.org/r/#/c/55904/ ? [11:15:28] uga [11:24:20] hi hashar [11:24:26] lo [11:33:37] it looks like these jobs are expected to overlap, am I understanding that right? how does that work? [11:34:09] are there any stats what is the waiting time for jobs to be run? [11:35:24] I guess I need to understand more about what happens once they get to the dispatcher log [11:39:07] apergos: yes, correct [11:39:39] apergos: piping to the logs is indeed a problem though :/ [11:41:03] so if they can run for 900 (I guess seconds) and they run every 5 minutes (= 300 seconds) isn't that going to be a problem with multiple invocations trying to write at the same time? [11:43:39] apergos: write where? [11:44:10] apergos: there is no problem on the database level. the dispatcher is specifically designed to run multiple instances in parallel. [11:44:12] into /var/log/wikidata/dispatcher*.log [11:44:24] that depends on how the OS handles pipes, i guess [11:44:29] that *might* be a problem [11:44:38] as in: some log lines getting lost [11:44:45] but nothing critical [11:44:58] i'm trying to test this locally. aude said it works fine on her setup [11:45:33] well you would want to test it with cases where you have overlap [11:45:54] or just run two of them with the redirect in two different sessions and see what it does [11:46:36] I expect you are going to have some garbled log entries; do you use these for anything? (if not maybe you want to redirect them to /dev/null instead) [11:47:05] New patchset: Odder; "(bug 46182) Set LQT as opt-out on se.wikimedia (chapter wiki)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56129 [11:47:05] New patchset: Odder; "(bug 46182) Set LQT as opt-out on se.wikimedia (chapter wiki)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56130 [11:47:26] hi [11:47:32] apergos: we need the logs for now, yes. they tell us how, why and where stuff is lagging [11:47:46] apergos: some lines may get garbled. i'm not too worried about that [11:47:52] I see [11:47:54] it's not like anything is absolutelöy relying on the, [11:47:57] *them [11:48:01] so no one is running stats off of them or anything [11:48:07] a quick test on my local box shows no issues [11:48:08] Change abandoned: Odder; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56130 [11:48:12] apergos: right, no stats [11:48:16] ok [11:48:17] apergos: no, we do stats off the database directly [11:48:19] just to see if something is going wrong [11:49:10] * aude has in my crontab [11:49:11] /usr/local/bin/mwscript extensions/Wikibase/lib/maintenance/dispatchChanges.php --wiki enwikidata --max-time 900 2>&1 >> /var/log/wikibase/dispatcher.log [11:49:29] same way done on hume, except log file might be in a different place [11:49:49] every 5 min (and i have 2 cronjobs) [11:50:01] * aude not stress testing it though, but needs to do that [11:50:02] Change restored: Odder; "Proper version with bug number." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56130 [11:50:09] apergos: is there way to run jobqueue manually for a wiki? [11:50:14] Change abandoned: Odder; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56129 [11:50:35] there is a way to run the next set of jobs for a given wiki yes [11:50:58] it didn't merge it [11:52:11] I wonder what's changed in the workflow now [11:53:37] it's verified by jenkins, +2 by me, what more can it want? [11:57:24] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55904 [11:57:33] tried again and it took it [11:57:34] weird [11:57:56] thanks apergos [11:58:22] live next time puppet runs [11:58:24] yay! [11:59:14] apergos: thanks! [12:01:18] apergos: once it has been running for a few minutes, could you give us a tail of the log? [12:01:37] i want to see whether it scales like I expect it to [12:03:18] on hume? [12:03:53] New review: Hashar; "Annnnnd this is a bug in puppet :( See https://projects.puppetlabs.com/issues/2053#note-18" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54970 [12:04:02] and which log do toy want, dispatcher or dispatcher2? [12:04:33] DanielK_WMDE: [12:06:18] apergos: and the way is? [12:06:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:06:52] huh? [12:07:58] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [12:08:38] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 12:08:29 UTC 2013 [12:08:49] PROBLEM - Puppet freshness on mw1099 is CRITICAL: Puppet has not run in the last 10 hours [12:08:50] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:09:00] Nikerabbit: I didn't get your question [12:09:48] PROBLEM - Puppet freshness on mw1025 is CRITICAL: Puppet has not run in the last 10 hours [12:09:48] PROBLEM - Puppet freshness on mw1121 is CRITICAL: Puppet has not run in the last 10 hours [12:10:48] PROBLEM - Puppet freshness on mw1130 is CRITICAL: Puppet has not run in the last 10 hours [12:11:08] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 12:10:58 UTC 2013 [12:11:49] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:13:39] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 12:13:32 UTC 2013 [12:13:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:14:37] apergos: both logs, please [12:14:48] just put them on a pastebin [12:14:48] i'm off for lunch [12:14:48] thanks! [12:14:48] PROBLEM - Puppet freshness on mw1120 is CRITICAL: Puppet has not run in the last 10 hours [12:14:48] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 12:14:41 UTC 2013 [12:15:48] PROBLEM - Puppet freshness on mw1160 is CRITICAL: Puppet has not run in the last 10 hours [12:15:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:15:58] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 12:15:48 UTC 2013 [12:16:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:16:48] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 12:16:47 UTC 2013 [12:17:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:18:28] RECOVERY - Puppet freshness on db11 is OK: puppet ran at Wed Mar 27 12:18:20 UTC 2013 [12:18:48] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [12:20:53] apergos: how to run jobs for testwiki? [12:21:17] oh. sorry! [12:21:32] you need to be on a jobqueue host... oh do you mean on a local instance? [12:21:43] apergos: test.wikipedia.org [12:22:14] hmm probably still need to be on a jobqueue host [12:22:22] just a sec I'll find the command [12:23:48] PROBLEM - Puppet freshness on mw1001 is CRITICAL: Puppet has not run in the last 10 hours [12:23:49] DanielK_WMDE: http://p.defau.lt/?Ob9DpqN_O0TR9JEZHgX4pw [12:23:56] ok now lemme find the job queue thing [12:25:38] php MWScript.php runJobs.php --wiki="$db" --procs="$forkcount" --type="$type" --maxtime=$hpmaxtime this is how it gets run on our hosts [12:27:56] types should be one of sendMail enotifNotify uploadFromUrl MoodBarHTMLMailerJob ArticleFeedbackv5MailerJob RenderJob [12:28:13] apergos: and how do I find what is a jobqueue host (which I can access?) [12:28:17] I think if you don't specify then it's refreshLinks2 or something [12:28:28] really? those are just the priority types [12:29:04] the jobqueue hosts have the role::applicationserver::jobrunner stanza in site.pp in the puppet repo [12:29:27] test.wp is a dedicated server still isn't it? [12:29:37] you could try running from there, it might work [12:29:50] apergos: and you vouch it wont break anything? ;) [12:29:51] assuming you're on it I mean [12:30:00] I don't vouch anything at all [12:30:14] just don't give it a large number of procs, give it like 2 [12:30:44] but really, I known only a little and the docs are scarce... I'm not sure if testwp is actually running on fenari if the code is there [12:30:58] no it's not [12:34:50] srv193 [12:35:33] the squid settings have that, which are /home/w/conf/squid [12:36:05] and more specifically text-settings.php [12:36:10] oh [12:36:17] (on fenari) [12:36:26] it's a hack, remember [12:37:05] PROBLEM - RAID on labstore1 is CRITICAL: CRITICAL: Partially Degraded [12:39:34] Cannot run a MediaWiki script as a user in the group wikidev [12:39:47] fair enough but what user should it be? [12:41:33] Nikerabbit: mwdeploy I guess [12:42:43] the message should give instructions i guess [12:42:44] or even attempt to sudo [12:43:31] !g I033acd9132e47b840c76da995d14c92e8376775d | Nikerabbit [12:43:31] Nikerabbit: https://gerrit.wikimedia.org/r/#q,I033acd9132e47b840c76da995d14c92e8376775d,n,z [12:43:53] Nikerabbit: that is the change, apparently you should even sudo -u apache [12:44:35] ah sorry (I was coding in another window) [12:53:10] hashar: cool, though commit message is weird place for documentation [12:59:18] Nikerabbit: sending a change :-] [13:04:11] Nikerabbit: isn't that the reason why you suggested to make a search for commits, docs etc. all together? :) [13:04:23] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [13:06:33] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [13:08:04] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54362 [13:08:17] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54363 [13:08:29] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54683 [13:08:53] New patchset: Hashar; "usage help when mwscript is not run as `apache` user" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56146 [13:09:09] New review: Hashar; "Follow up in https://gerrit.wikimedia.org/r/56146 which improves the error message." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/44200 [13:09:12] Nikerabbit: https://gerrit.wikimedia.org/r/56146 :-) [13:11:58] apergos: hm, that looks like the log from a single process, not 3 processes per file. i guess we should add the PID to the output. [13:12:12] can you cive me more lines? say, 500 from each file? [13:14:44] http://p.defau.lt/?Nt0Y8zFul7tR05v5cWzg_A I can give you what my scrollback has [13:14:45] here's one [13:15:40] http://p.defau.lt/?tNCmGeGxlrFulzc8IOZimA here;;s two [13:21:28] New patchset: Matthias Mullie; "on frwiki, show CTA4 (signup or login) for 100%" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56149 [13:23:03] New patchset: Odder; "(bug 46461) Set $wgAutoConfirmCount to 50 for Wikidata" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56150 [13:23:16] apergos: that looks a lot better, thanks [13:23:26] sure [13:41:32] New patchset: Aude; "Remove wikidata.org from CORS, keep only *.wikidata.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56153 [13:42:55] New review: Aude; "please also see https://gerrit.wikimedia.org/r/#/c/49069/ (and appreciate review of that patch) to m..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56153 [13:57:58] !log Jenkins: disabled the old Gerrit Trigger Plugin {{bug|46415}} [13:58:06] Logged the message, Master [14:05:35] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [14:07:45] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [14:09:10] !log jenkins has been restarted by mistake :-( Job building will be unavailable for up to half an hour. [14:09:16] Logged the message, Master [14:11:27] New patchset: Ottomata; "Syncing edit logs from gadolinium instead of locke" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56158 [14:12:29] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56158 [14:12:47] Change abandoned: Matthias Mullie; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54807 [14:13:58] New patchset: Demon; "Properly configure hooks-bugzilla plugin based on feedback" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54989 [14:16:32] Change abandoned: Demon; "Squashed this into I1219d6ab." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55048 [14:17:03] New review: Demon; "Ignore PS1, just review PS2." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54989 [14:17:16] New patchset: Matthias Mullie; "Update frwiki config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/55946 [14:18:49] New review: Demon; "recheck" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54989 [14:21:49] Change abandoned: Matthias Mullie; "Has been folded into https://gerrit.wikimedia.org/r/#/c/55946/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56149 [14:28:41] New patchset: Matthias Mullie; "Update frwiki config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/55946 [14:29:44] New patchset: Ottomata; "Now using varnish hostnames to filter for mobile logs. Now syncing mobile logs from gadolinium to stat1." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56161 [14:29:53] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56161 [14:35:17] New patchset: Ottomata; "Changing minute on misc::statistics::rsync_job to use fqdn_rand to spread out rsync jobs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56162 [14:35:30] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56162 [14:48:22] New patchset: Ottomata; "Undoing the last change. fqdn_rand generates the same number on the same host (duh.)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56164 [14:49:00] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56164 [14:51:54] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [14:56:54] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [15:04:12] PROBLEM - Puppet freshness on db11 is CRITICAL: Puppet has not run in the last 10 hours [15:05:04] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: be_x_oldwiki to 1.21wmf12 [15:05:11] Logged the message, Master [15:06:23] PROBLEM - RAID on db11 is CRITICAL: CRITICAL: Defunct disk drive count: 1 [15:36:54] can someone help..trying to deploy mw1209+ but running into rsync issues. puppet runs error free. added to site.pp and all dsh/node_groups..set to false in pybal [15:36:55] http://p.defau.lt/?lGV0Vkj1HN6G8whoaozYVg [15:37:40] there's been a few lock file errors like that around for a few days [15:37:41] reedy@fenari:/home/wikipedia/common$ ls -al php-1.21wmf11/.git/modules/extensions/MWSearch/index.lock [15:37:41] -rwxrw---- 1 catrope wikidev 0 Mar 20 23:58 php-1.21wmf11/.git/modules/extensions/MWSearch/index.lock [15:38:00] reedy: i understand the lock files to not be an issue [15:38:17] What is then? [15:38:19] it is this sync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1536) [generator=3.0.9] [15:38:34] Isn't that just a result of the above errors? [15:38:36] at least that is what I was told by notpeter [15:38:48] live-1.5 was a folder, but it's now a symlink (as of last night) [15:40:40] The files on disk look sensible [15:41:24] reedy: yes...it looks like it is fetching now. [15:41:26] thx [15:41:42] reedy@fenari:/home/wikipedia/common$ apache-fast-test test.txt mw1209 mw1210 [15:41:42] testing 1 urls on 2 servers, totalling 2 requests [15:41:42] spawning threads... [15:41:42] http://en.wikipedia.org/wiki/Main_Page [15:41:42] * 200 OK 62443 [15:47:45] paravoid, mind if I poke again for a review of this: [15:47:46] https://gerrit.wikimedia.org/r/#/c/49710/ [15:47:46] ? [15:47:50] (hope you don't mind, cause I just poked!) [15:47:55] I don't mind :) [15:47:57] looking [15:48:07] PS19? [15:48:07] this is something that is on the analytics work in progress features mingle bla bla [15:48:09] lol! [15:48:11] so they ask me abou tit every day [15:48:13] haha [15:48:15] poor ottomata [15:49:02] if you want to see the inline comments, they are on patchset 13 [15:49:16] but I responded to the questionable ones in the main comment for 19 [15:51:47] New patchset: Hashar; "0.6.1-2 gbp.conf and tweaks" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/56168 [15:52:12] PROBLEM - profiler-to-carbon on professor is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/udpprofile/sbin/profiler-to-carbon [15:53:09] RECOVERY - profiler-to-carbon on professor is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/udpprofile/sbin/profiler-to-carbon [15:54:16] so, [15:54:36] I think moving upstart_job to a module is a good thing [15:54:44] even though it's a bit too much to ask of you for limn :) [15:55:03] well, if it were to be just moving the current define to a module, that's easy [15:55:04] how's that upstart module you mentioned? [15:55:06] is it any good? [15:55:10] complicated, looks pretty compllete [15:55:23] https://github.com/bison/puppet-upstart [15:55:40] oh that's for defining jobs too [15:55:59] oh my debian favorites :-D Got you a patch for the python-voluptuous debian package. https://gerrit.wikimedia.org/r/#/c/56168/ :] [15:56:00] yeah, via parameters, which is good, but probably won't cover all cases [15:56:05] added you both on review. [15:56:38] :) [15:56:57] paravoid: Zuul is not going to be packaged anytime soon. It needs a few more dependencies :] [15:58:13] hashar: oh hah, just saw your mails to debian-python [15:59:04] paravoid: and the good news are that OpenStack infrastructure people are really willing to package. [15:59:14] cool [15:59:19] I'd be willing to sponsor you btw [15:59:28] you don't need to go through mentors or zigo [15:59:37] New review: Lcarr; "Please rebase this on top of the recent nagios.pp -> misc/icinga.pp moves and then we should be good..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [15:59:43] yeah still have to have a look at the debian process to get a package submitted. Then I will know what a sponsor is :-] [15:59:53] but to sponsor it I'd like to see it under the python modules repos and processes [16:00:11] a sponsor is someone who reviews the package for you and ultimately uploads it into Debian [16:00:34] so my next step is to get the package hosted on alioth / debian-python svn [16:00:38] yep [16:00:46]