When searching for a period for the star KELT KC11C028126, I used ZTF survey data (2 files: zg and zr) for input via VStar's Flexible Text Format File plug-in. After loading both files, I shifted the g data up over the r (by -0.65) using the Baseline Shifter tool.
I then used the "AOV tool with Period Range" to obtain a periodogram with a "top hit" of 2.81117, which in turn yielded a nice tight phase plot, based on a period of 2.81117 d and an epoch of 2458221.241. This was based on using the zg data alone, a range of 2.8 to 2.9 and a resolution of .000001.
When I tried the same thing again, using the zr data alone and the same parameters however, the VStar periodogram came up with a "top hit" of 2.81146, which yielded a much less tidy Phase Plot. When I plotted the rg data alone, using the 2.81117 value, the Plot became "tidy" again, suggesting strongly that 2.81117 was the correct period for both the red and the green, as you would expect (same star, different bands).
All of this is fairly easy to reproduce with VStar, if you are interested. I don't know how to attach files in an AAVSO forum but if you email me direct (yorkpf@gmail.com) I can send you the two input .csv files.
This raises a couple of questions:
1. Why did the AOV periodogram not yield the correct period on the zr data, using the same search parameters as used for the zg?
2. What is the thinking behind VStar's forcing a choice of which band to use for the periodogram, in the first place? In other words, why not always (normally?) base the periodogram on both bands?
Paul
Hi Paul
It would be so good to be able once again attach a limited subset of file types to our forum posts again...
I was able to reproduce your results with the CSVs you sent, thanks.
Try an AoV range of 2.8 to 2.9 and resolution of 0.00001 with 50 bins.This quickly gives 2.81122 and 2.81123 for zr and zg respectively.
Both appear to be better than 2.81117, at least by visual inspection. What do you think?
Looking at the F-statistic in Current Mode ANOVA with 50 bins (0.02 Phase Steps per Mean Series Bin in Phase Control Dialog) may help, by comparison to what you see in the AoV result table.
Re: 1. from VStar's perspective of course, the zg and zr data are totally different datasets, with a different number of observations, taken through a different filter, with different characteristics. Is 2.81117 correct? I'm not sure it is.
Re: 2. I remember asking Arne Henden the same question during a break at Citizen Sky I. I thought of checkboxes vs radio buttons that would have allowed multiple series to be selected. There are a few reasons, but it comes back to what I said re: 1: they're different datasets, each series. Sometimes it does help to combine them and a Filter can be used to create a single set of data to be analysed (or a new series).
Thoughts?
David
It is common that data sets through different filters have different shapes and may even have different times of minima/maxima. Combining them can greatly increase the scatter and make analysis more difficult, less precise and less accurate.
It is common that data sets through different filters have different shapes and may even have different times of minima/maxima. Combining them can greatly increase the scatter and make analysis more difficult, less precise and less accurate.
Brad Walter, WBY
Hi Brad
You said: "... data sets through different filters have different shapes and may even have different times of minima/maxima. Combining them can greatly increase the scatter and make analysis more difficult, less precise and less accurate".
Whilst I agree with your statement, as far as it goes, I don't believe it is relevant to the problem I have presented.
It is important to focus on the fact that we are dealing here only with eclipsing binaries (I'm not sure that I pointed that out in my original posting). For eclipsing binaries, the zr and zg bands ought to yield the same period. After all, what possible physical mechanism could account for a different periodicity in two different passbands, when both the red and green series have been captured over the same timeframe with the same cadence. We are talking here about two stars in orbit about one another - that are not "true" or intrinsic variable stars in the pulsating sense.
At the same time, I acknowledge that different amounts of scatter in the data could occur and (of course) different magnitudes and different maxima and minima. Having said that, periodicity is the only measurement at stake here and with these stars the two bands normally have identical periods..
Indeed, that is the case here. It can be shown (and I have checked) that if the red and green are folded together using a period of 2.81117, they superimpose nicely on top of one another, showing they have the same period.
My issue here is that, when processing the red alone, VStar did not come up with the 2.81117 period at all. It came up with 2.81146, which is incorrect because it does not work at all as a basis for folding a light curve. This is something that still needs to be explained.
The fact that by using 50 bins
Hi David: You said "... try an AoV range of 2.8 to 2.9 and resolution of 0.00001 with 50 bins.This quickly gives 2.81122 and 2.81123 for zr and zg respectively. Both appear to be better than 2.81117, at least by visual inspection. What do you think?"
IMHO, anyone looking at these curves could pick any one of the three. Yes, you can see minor differences from one to another but there is no clear and repeatable basis for choosing any particular one over the others. The reason I chose 2.81117 is that it was clearly much better than the previous period stored in the VSX database (2.8114224). Trying other values close to 2.81117 (on both sides of it) didn't lead to any clear reduction in dispersion of the folded light curve.
So, why wouldn't anyone else using VStar simply stop the analysis where I did? What is the practical trigger that would make one do a bin change and then why 50 bins rather than some other number?
In any case, my real issue for this posting is as described in my comments (above) on Brad's comments. I'm not really concerned here with what is the "real" period?
Cheers
Paul
Hi Paul
Re: "...why wouldn't anyone else using VStar simply stop the analysis where I did? What is the practical trigger that would make one do a bin change and then why 50 bins rather than some other number?"
You found what looked like a reasonable period with the g data but not r, with the same AoV parameters. I changed the bins from 10 to 50 because my thinking was that the number of bins was not modelling the data well for the ANOVA calculation per trial fold. I tried other bin sizes before success: 20 at least as I recall.
The reason why binned means may not be not a good model of the folded r data becomes evident when you select the Means series in the Phase Plot Control dialog and leave the Phase Steps per Mean Series bin at the default of 0.1 (1/10).
Now change it to 0.02 (1/50). Do you see what I mean?
I would show phase plots with binned means here if I could...
So the choice to change the bins for r data was because of the lack of agreement between AoV g and r.
Nothing I've said explains why 10 bins were enough for a reasonable result for the g data. More thought and input from others may help. The only observation I would make at the moment is that the out of eclipse scatter appears to be less for the g vs r data.
David
Hi Paul
Re: "...why wouldn't anyone else using VStar simply stop the analysis where I did? What is the practical trigger that would make one do a bin change and then why 50 bins rather than some other number?"
You found what looked like a reasonable period with the g data but not r, with the same AoV parameters. I changed the bins from 10 to 50 because my thinking was that the number of bins was not modelling the data well for the ANOVA calculation per trial fold. I tried other bin sizes before success: 20 at least as I recall.
The reason why binned means may not be not a good model of the folded r data becomes evident when you select the Means series in the Phase Plot Control dialog and leave the Phase Steps per Mean Series bin at the default of 0.1 (1/10).
Now change it to 0.02 (1/50). Do you see what I mean?
I would show phase plots with binned means here if I could...
So the choice to change the bins for r data was because of the lack of agreement between AoV g and r.
Nothing I've said explains why 10 bins were enough for a reasonable result for the g data. More thought and input from others may help. The only observation I would make at the moment is that the out of eclipse scatter appears to be less for the g vs r data.
David
A very interesting discussion! I am a relative novice, so what follows is my attempt to understand the issues involved. Some canonical examples would be extremely useful!
I guess the problem can be broken down into two parts:
My own (limited) experience is as follows:
This seems to make sense if one imagines two stars with different colors eclipsing one another. I guess a complicating factor might be red-shifting/blue-shifting depending on whether the component is moving away or towards the observer’s line of sight (though I have no idea whether this would have any material effect on the survey observations we are using).
What I’m having a hard time understanding is how the period duration itself could be different for different bands?
On the question of computer algorithms, I think it is easy to see how:
The computer algorithms we use can be very sensitive to the number and distribution of observations.
However, if the heart of the algorithm is some kind of curve fitting (or minimizing distribution) to observations in a number of bins in different trial periods) then it would seem that:
..should increase the chances of the algorithm getting the period “right”. And, if that is the case, then combing bands could be helpful?
-leo
Hi Leo
Nice overview.
Agreed re: examples. The one Paul provided is certainly interesting.
The differences in shoulder/descent points you make could lead to increased scatter when bands are combined. The scatter problem has come up a few times here (e.g. in Brad's reply). If so, combining bands could make things worse.
It doesn't seem to be the case here though. If I use the Magnitude Baseline Shifter plugin (+0.65 on r band after additively loading r and g files provided by Paul), View -> Filter from Plot, then AoV with range 2.8 to 2.9, resolution 0.00001, and 10 bins on the Filter series, I get 2.81123 as the top hit with the last top hit being 2.81117 (ranked by F-statistic values). Flicking between those folded light curves in Analysis -> Previous Phase Plots helps to see the difference.
With just the g data alone and the AoV parameters mentioned, 2.81117 is the top hit. 50 bins gives 2.81121. Again, adding the mean series and changing means series source to the g data helps to see why 10 or 20 bins does not yield a good model while 30 or 50 does.
David
Certainly a very interesting discussion. However, looking back over the exchange again, I am not sure that I made myself clear enough. So let me be specific:
1. The key issue here is the period (in the r and g bands), not amplitude, scatter or anything else.
2. I am restricting myself to just eclipsing binaries (not pulsating types) so it is very difficult to see why r and g bands should yield different periods, and only in the g case is the period actually correct (which can be proved by folding each light curve - the g curve is "nice", the r curve is not.
3. In a previous exchange, David: "Re my "...why wouldn't anyone else using VStar simply stop the analysis where I did? What is the practical trigger that would make one do a bin change and then why 50 bins rather than some other number?", you replied: "You found what looked like a reasonable period with the g data but not r, with the same AoV parameters. I changed the bins from 10 to 50 because my thinking was that the number of bins was not modelling the data well for the ANOVA calculation per trial fold. I tried other bin sizes before success: 20 at least as I recall".
However, it is important to bear in mind that the way I have been operating, when looking for a period, is to analyse the green data only with the AOV periodogram. What happens then is that the top hit/s are plugged into the Phase Plot tool and BOTH the g and the r curves are folded. When this is done, you can tell by eye whether the hit works for both bands. In every case so far, in my experience, the period arrived at in this way works for both bands. So there is no need to go looking for a period in the r band. It is also the case that the combined (superimposed) Phase Plot is what is submitted to the VSX mediators. So in the end, both bands end up on the same diagram.
Now to my point about a "trigger". As you can see from the above discussion (I hope), there is not ordinarily any "trigger" or clue that one should look at the red. The only reason I chose to do so is pure curiosity.
So I think the issue still stands: There is good reason to think that the same algorithm (on eclipsing binary data) should produce the same period (at least as one of its "top hits", especially when the number of observations in each band is of the same order and over the same timeframe. This is happening for the g but not for the r. Not only does analysis of the r not come up with the same period as that obtained for the g (which is demonstrably correct for both bands) but it comes up with a period for the r that is demonstrably wrong for both r and g.
4. The issue of the number of bins to choose is, I think, a different issue entirely; Guidance is needed on how to choose the number of bins, before the analysis commences, ie. why it should be changed from 10. Once the analysis has completed, no matter what period it yields, there is no reason or trigger to think that a change of bins might improve things.
Cheers: Paul
While I have not forgotten about Paul's last post here, in an email communication, Paul suggested a GitHub issue would be useful for capturing discussion/progress. I agree, and here it is: https://github.com/AAVSO/VStar/issues/346
Input from others here or in that issue is welcome.
I also wonder whether pointing to this forum topic in the Data Analysis forum would be worthwhile.
David