Identifying Optical Counterparts to X-ray Sources#

From this point on, the methodolgy for identifying the best candidate counterpart for each XRB and becomes fairly open ended. While there are still specific steps that need to be taken, there are a countless number of ways to do so! Here, I detail the steps I typically take to optimize my workflow. The details of one’s workflow do not matter as long as you accomplish the following:

  1. Identify optical counterparts that fall within 2-\(\sigma\) of each X-ray source;

  2. Within this population, identify contaminants (i.e. foreground stars, background galaxies), clusters, and point-sources (i.e. candidate donor stars);

  3. Extract the photometry from clusters and point-sources (AKA stars);

  4. Estimate the masses of point-sources and ages of clusters to determine the most likely mass classification (low, intermediate, or high) of each XRB; and

  5. Flag possible supernova remnants (SNR).

This chapter covers step 1. The next chapter, Classifying Optical Counterparts, will cover step 2 and 5. Estimating XRB Masses with CMDs and CCDs covers steps 3 and 4.

Isolating Candidate Counterparts with DaoClean#

Now that we’ve identified all HST optical point sources with XRBID.AutoPhots.RunPhots() or photutils.detection.DAOStarFinder() and calculated the best positions and 2-\(\sigma\) radii of each CXO X-ray source, we can finally select the candidate optical counterparts for each XRB. This is done by isolating all sources that fall within the 2-\(\sigma\) radius of each X-ray source. One can do so manually, but the easiest way is to run Sources.DaoClean().

DaoClean() works by taking the coordinates of each source in a DataFrame (read in as sources) and searching for sources in a second DataFrame (read in as daosources) that fall within a given radius. It returns a DataFrame equivalent to that read in as daosources, but only containing sources that fall within the search radius of a sources source. So, for XRBs in this example, the DataFrame given as sources would contain each X-ray source, their coordinates, and their calculated 2-\(\sigma\) radii. The DataFrame given as daosources, on the other hand, contains the point sources identified by RunPhots() (or DAOStarFinder()) in your base filter of choice.

Before running DaoClean(), you will need to place the coordinates of each RunPhots()/DAOStarFinder() point source into a new DataFrame. This can be done by using DataFrameMod.BuildFrame() or Sources.LoadSources() on the .ecsv data file that was produced in the previous chapter (by default, RunPhots() saves this as something similar to photometry_M101_f555w_acs_full.ecsv), but only if you did not apply an additional shift to your region file. In my case, I decided to apply a shift during the RunPhots() phase, because the region files that were produced did not align well to the HST images for reason unknown.

Because my region files are shifted, and because my X-ray coordinates were aligned to these shifted regions, I need to create a DataFrame using the coordinates of those shifted region files. While we’re at it, one may find it’s a good idea to compile both the img and fk5 coordinates into a single DataFrame, for greater flexibility later down the line.

from XRBID.Sources import GetCoords, GetIDs
from XRBID.DataFrameMod import BuildFrame

# Pulling coordinate and ID information from region files
x,y = GetCoords("../testdata/M101_daofind_f555w_acs_img.reg")
ids = GetIDs("../testdata/M101_daofind_f555w_acs_img.reg")
ra,dec = GetCoords("../testdata/M101_daofind_f555w_acs_fk5.reg")

# Compiling into a single DataFrame
DaoFrame = BuildFrame(headers=['DaoID','X','Y','RA','Dec'], 
                      values=[ids,x,y,ra,dec])
DaoFrame.to_csv("../testdata/M101_daofind_acs_coords.frame")
# To read in and check the .frame file saved above:
import pandas as pd 

# I choose to use pd.read_csv here because of how big this file is
DaoFrame = pd.read_csv("../testdata/M101_daofind_acs_coords.frame").drop("Unnamed: 0", axis=1)
display(DaoFrame)
DaoID X Y RA Dec
0 1 7710.069326 3655.196224 210.888379 54.225405
1 2 7712.935688 3688.774732 210.888312 54.225872
2 3 7926.745455 3696.670351 210.883232 54.225984
3 4 7740.506306 3722.354818 210.887658 54.226338
4 5 7802.588601 3728.903381 210.886183 54.226430
... ... ... ... ... ...
178097 178098 8939.411035 19364.831601 210.859372 54.443607
178098 178099 9082.172967 19379.774137 210.855963 54.443815
178099 178100 8920.492988 19385.066555 210.859824 54.443888
178100 178101 8947.054147 19392.263104 210.859190 54.443988
178101 178102 9038.069684 19406.788855 210.857016 54.444190

178102 rows × 5 columns

Important

DaoClean allows the user to attempt to do the radius search using the fk5 (degree) coordinate system. However, because of how this system works and the image is stretched, you’ll find that the search may return unexpected results (e.g. returning DaoClean sources within an ellipse around XRBs instead of a circle). It is recommended that one used the img coordinate system instead, which returns more predictable results.

As noted above, it is best to run DaoClean() using the img coordinate system, which gives coordinates and radii in pixel units relative to the reference image, so we will also need to get the img coordinates of the X-ray sources (which up until this point have been plotted in fk5 coordinates).

In DS9, open one of the region files containing the corrected coordinates of the X-ray sources and resave it in DS9 using image coordinates. Then read in the image coordinates and save it to your X-ray source DataFrame. Here, I saved my new region file as M101_bestrads_2sig_img.reg.

from XRBID.Sources import LoadSources, GetCoords

# Loading in the original DataFrame containing the best 2sig radii for each source
M101_best = LoadSources("../testdata/M101_csc_bestrads.frame")

# Reading in the newly-saved img region file
xsources, ysources = GetCoords("../testdata/M101_bestrads_2sig_img.reg")
# Adding the image coordinates to the DataFrame
M101_best["X"] = xsources
M101_best["Y"] = ysources

# Convert the radius from arcseconds to pixels. 
# The conversion is 0.05 for ACS/WFC and 0.03962 for WFC3/UVIS
M101_best["2Sig (pix)"] = M101_best["2Sig"] / 0.05 

# Saving the changes to the DataFrame
M101_best.to_csv("../testdata/M101_csc_bestrads.frame")
display(M101_best)
Reading in sources from ../testdata/M101_csc_bestrads.frame...
Retrieving coordinates from ../testdata/M101_bestrads_2sig_img.reg
Separation CSC ID RA Dec ExpTime Theta Err Ellipse Major Err Ellipse Minor Err Ellipse Angle Significance ... MS Ratio lolim MS Ratio hilim Counts Counts lolim Counts hilim 1Sig 2Sig X Y 2Sig (pix)
0 0.835778 2CXO J140312.5+542056 210.802227 54.348952 98379.672624 1.280763 0.296164 0.295254 89.606978 22.302136 ... -0.616490 -0.485322 234.492581 216.678502 252.306660 0.493011 0.988256 11333.8500 12549.3440 19.765113
1 1.951564 2CXO J140312.7+542055 210.803345 54.348663 49085.676020 4.725025 0.548072 0.380093 86.863933 2.648649 ... -1.000000 -0.715178 209.970149 192.843759 227.096539 0.516471 1.038077 11286.9600 12528.5350 20.761533
2 2.480949 2CXO J140312.5+542053 210.802221 54.348072 98379.672624 1.321652 0.296164 0.295733 84.016491 19.554042 ... -0.364147 -0.265459 252.846172 234.295673 271.396671 0.492957 0.988086 11334.1110 12486.0170 19.761726
3 5.227586 2CXO J140313.1+542052 210.804553 54.347994 132129.439441 2.762119 0.880650 0.578584 29.467809 3.421053 ... -0.973766 -0.718926 23.229213 16.216620 30.241806 0.561749 1.188360 11236.2880 12480.3160 23.767200
4 9.124404 2CXO J140313.5+542053 210.806667 54.348188 98379.672624 1.410741 0.633314 0.466131 58.539300 1.739130 ... -1.000000 -0.370394 7.040343 2.992146 11.088540 0.593981 1.300029 11147.5470 12494.3280 26.000572
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
550 883.411465 2CXO J140138.8+541527 210.411860 54.257747 132129.439441 12.087047 8.647887 6.103176 111.468541 4.277778 ... -0.932542 -0.617739 76.034629 52.273807 99.795450 1.921653 4.517698 27753.7720 6032.4759 90.353960
551 884.529187 2CXO J140450.6+541721 211.211172 54.289330 139822.610581 9.951487 1.357801 1.057531 95.305118 8.976584 ... -0.472829 -0.270456 155.562531 135.448493 174.718757 0.793067 1.558488 -5851.2446 8301.6458 31.169764
552 886.380054 2CXO J140216.3+543313 210.567813 54.553731 131222.705704 13.887674 2.480733 2.445301 115.654637 6.783292 ... -0.322923 -0.118051 149.738677 121.454705 176.358886 1.676834 3.103807 21117.9750 27312.5690 62.076131
553 890.405834 2CXO J140445.4+541452 211.189426 54.247897 139819.410649 10.703677 4.581313 2.651713 135.472019 4.411765 ... 0.501562 1.000000 NaN NaN NaN 0.487466 0.974933 -4952.8736 5313.5560 19.498651
554 892.520732 2CXO J140447.2+542633 211.196627 54.442543 14276.493764 3.416463 1.612100 0.795238 86.007749 2.650000 ... NaN NaN 7.591890 4.313574 10.870206 0.738103 1.710737 -5181.2993 19329.5700 34.214743

555 rows × 33 columns

Now everything should be ready to run through DaoClean() using img coordinates:

from XRBID.Sources import DaoClean
from XRBID.WriteScript import WriteReg

DaoCleanFrame = DaoClean(daosources=DaoFrame, sources=M101_best, sourceid="CSC ID", 
                         coordsys="img", coordheads=['X','Y'], radheader="2Sig (pix)") 


# Renaming the DaoID header to "F555W ID", to match the output of the other filters
DaoCleanFrame = DaoCleanFrame.rename(columns={'DaoID': 'F555W ID'})

display(DaoCleanFrame)

WriteReg(DaoCleanFrame, coordsys='fk5', width=2, 
         outfile='../testdata/M101_daoclean_f555w_acs_fk5.reg')
Cleaning DAOFind sources. This will take a few minutes. Please wait.. 
DONE WITH CLEANING. CREATING DATAFRAME...
F555W ID X Y RA Dec CSC ID
0 101364 11331.067786 12533.111216 210.802294 54.348726 2CXO J140312.5+542056
1 101394 11327.699419 12536.055626 210.802374 54.348767 2CXO J140312.5+542056
2 101442 11344.963322 12537.104504 210.801963 54.348782 2CXO J140312.5+542056
3 101505 11323.312878 12538.940418 210.802478 54.348807 2CXO J140312.5+542056
4 101528 11332.293329 12541.942013 210.802264 54.348849 2CXO J140312.5+542056
... ... ... ... ... ... ...
1266 3143 5443.850802 6942.044065 210.942354 54.271014 2CXO J140346.1+541615
1267 52656 3436.981758 10161.470633 210.990268 54.315671 2CXO J140357.6+541856
1268 52713 3445.711155 10162.184256 210.99006 54.315681 2CXO J140357.6+541856
1269 68 6813.402988 4150.018831 210.909697 54.232264 2CXO J140338.3+541355
1270 307 4105.0065 4868.306051 210.97409 54.242176 2CXO J140353.7+541430

1271 rows × 6 columns

Saving ../testdata/M101_daoclean_f555w_acs_fk5.reg
../testdata/M101_daoclean_f555w_acs_fk5.reg saved!

Check the results of the search by plotting both the 2-\(\sigma\) radii of X-ray sources and the DaoClean sources on the HST image on DS9. If it looks like the search is missing sources as though the search radius is too small, you can modify the DaoClean() parameters to increase the search radius with wiggle (in pixels or degrees). This will make the search a little less strict, so you can include sources a whose centroids fall just outside of the 2-\(\sigma\) radius of each X-ray source, if needed.

Cross-referencing sources across filters#

The DaoClean() procedure above tells us which F555W source falls within 2-\(\sigma\) of an X-ray source. Next, we need to figure out what each of these F555W sources is called in the other filters, as identified by RunPhots() in {ref}(sec:runphots), in order to pull the appropriate photometries from each of the photometric data files. This can be done with XRBID.Sources.Crossref().

Unlike DaoClean(), Crossref() takes in a DataFrame containing at least the source ID and coordiates and cross-references them with sources within one or more given region file. The region files are read in as a list under the regions parameter. You can define the name of each catalog the region files represent with the catalogs parameter; this defines the headers under which Crossref() will save the cross-referencing results for each region file. (See Incorporating Other Catalogs below for a more general look at using Crossref())

Note

Sources.Crossref() isn’t actually very different, functionally, from Sources.DaoClean(), except Crossref will search through multiple region files (AKA catalogs) at once and add as many extra lines to the DataFrame as needed to accommodate all possible associations. Also, Crossref will only keep information on the coordinates and ID of each source, and the IDs of sources that may be associated with it in the other catalogs. It is not recommended to use Crossref instead of DaoClean to find daosources associated with each XRB, because several sources may be found within each XRB 2-\(\sigma\), and they way the IDs of point sources in each catalog are organized may give the false impressions that they are associated with one another across catalogs (e.g. in a single line in the resulting DataFrame, you could have an F555W ID from one nearby star, an F435W ID from another, and an F814W from a third; test it out using both Crossref and DaoClean to see what I mean). This is only mitigated by reducing the search radius, which I do in Crossref below.

from XRBID.Sources import Crossref

# The ID of each F555W source is saved in DaoCleanFrame under the header 'DaoID'
DaoCleanMatch = Crossref(DaoCleanFrame, 
                         regions=['../testdata/M101_daofind_f435w_acs_img.reg',
                                  '../testdata/M101_daofind_f814w_acs_img.reg'], 
                         catalogs=['F435W', 'F814W'], coordheads=['X','Y'], 
                         sourceid="F555W ID", outfile="../testdata/M101_daoclean_matches.frame")

DaoCleanMatch
Finding cross-references between sources. This will take a few minutes. Please wait.. 
DONE WITH CLEANING. CREATING DATAFRAME...
X Y F555W ID F435W ID F814W ID
0 11331.067786 12533.111216 101364 143088 NaN
1 11327.699419 12536.055626 101394 143125 148555
2 11344.963322 12537.104504 101442 143189 148645
3 11323.312878 12538.940418 101505 143293 NaN
4 11332.293329 12541.942013 101528 143332 NaN
... ... ... ... ... ...
1397 5443.850802 6942.044065 3143 4807 6523
1398 3436.981758 10161.470633 52656 74953 71911
1399 3445.711155 10162.184256 52713 74989 72017
1400 6813.402988 4150.018831 68 105 100
1401 4105.006500 4868.306051 307 NaN NaN

1402 rows × 5 columns

Note, it’s possible that each source above is associated with more than one source in one of the catalogs (region file) its being cross-referenced with. For example, a source that’s saturated in one filter may appear as multiple sources in a filter with a shorter exposure time. Crossref() keeps track of every possible association, so it’s likely that DaoCleanMatch ends up longer than DaoCleanFrame. Keeping the search radius low (the default is 3 pixels) is one way to prevent this as much as possible, which is what we’d want in this case.

It’s also useful for the next step to add the CSC ID back onto the DataFrame, which can be done as so:

from XRBID.DataFrameMod import Find 

# Adding "CSC ID" to columns in DaoCleanMatch
DaoCleanMatch["CSC ID"] = None

# Adding the CSC IDs of each source the daosources are associated with
for i in range(len(DaoCleanMatch)): 
    # Searches for the CSC ID associated with each F555W ID (DaoID in DaoCleanFrame)
    tempid = DaoCleanMatch["F555W ID"][i] 
    tempcsc = Find(DaoCleanFrame, "F555W ID = " + str(tempid))["CSC ID"][0]
    DaoCleanMatch["CSC ID"][i] = tempcsc
    
# Updating the DataFrame
DaoCleanMatch.to_csv("../testdata/M101_daoclean_matches.frame")
display(DaoCleanMatch)
X Y F555W ID F435W ID F814W ID CSC ID
0 11331.067786 12533.111216 101364 143088 NaN 2CXO J140312.5+542056
1 11327.699419 12536.055626 101394 143125 148555 2CXO J140312.5+542056
2 11344.963322 12537.104504 101442 143189 148645 2CXO J140312.5+542056
3 11323.312878 12538.940418 101505 143293 NaN 2CXO J140312.5+542056
4 11332.293329 12541.942013 101528 143332 NaN 2CXO J140312.5+542056
... ... ... ... ... ... ...
1397 5443.850802 6942.044065 3143 4807 6523 2CXO J140346.1+541615
1398 3436.981758 10161.470633 52656 74953 71911 2CXO J140357.6+541856
1399 3445.711155 10162.184256 52713 74989 72017 2CXO J140357.6+541856
1400 6813.402988 4150.018831 68 105 100 2CXO J140338.3+541355
1401 4105.006500 4868.306051 307 NaN NaN 2CXO J140353.7+541430

1402 rows × 6 columns

Checking Crossref() results#

It’s a good idea check the results of Crossref() in DS9 by plotting the daofind region files for each filter over the full-color image of the galaxy. If there are some obvious bad matches, you can remove or change them manually in the DataFrame save file created above.

One easy way to do so is to create a bash script that saves images of each CSC source with the region files plotted. You can then go through each image, compare the daofind source ID’s associated with each filter to that listed in your *_daoclean_matches.frame file, and adjust as needed. I opt to remove any spurious sources (e.g. false detections) from my .frame file, while I’m at it.

The function to create the bash script is XRBID.WriteScript.WriteDS9(), which is described in {ref}(chap:sourceids). Once the script is created, you can open an xterm terminal with xterm & in your command line, and then running bash <bashscriptname>.sh.

I will not go through these steps here, but it’s a good idea to incorporate then in your own workflow.

Once you’re happy with your matches, it’s a good idea to renumber the candidate counterparts, so that you’re not relying on the long Dao IDs. This will make the imaging and donor identification steps we do later a little easier.

# For each X-ray source, number the associated stars starting from 1.
tempcsc = DaoCleanMatch["CSC ID"][0] # Keeping track of the current X-ray source
DaoCleanMatch["StarID"] = None # Storing the star number for each X-ray source

starno = 1 # Star number counter

for i in range(len(DaoCleanMatch)): 
    # If this is a new CSC ID, save it to tempcsc and restart starno counter
    if tempcsc != DaoCleanMatch["CSC ID"][i]: 
        tempcsc = DaoCleanMatch["CSC ID"][i]
        starno = 1
    DaoCleanMatch["StarID"][i] = starno
    starno += 1 # adds one to the starno counter

# Saving the candidate region file with new numberings
WriteReg(DaoCleanMatch, coordsys="image", coordnames=["X","Y"], 
         idname="StarID", width=2, fontsize=12,
         showlabel=True, outfile="../testdata/M101_XRB_candidates.reg")

# DaoCleanMatch has now renumbered each star associated with 
# an X-ray source for simplicity
display(DaoCleanMatch)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 16
     13     starno += 1 # adds one to the starno counter
     15 # Saving the candidate region file with new numberings
---> 16 WriteReg(DaoCleanMatch, coordsys="image", coordnames=["X","Y"], 
     17          idname="StarID", width=2, fontsize=12,
     18          showlabel=True, outfile="../testdata/M101_XRB_candidates.reg")
     20 # DaoCleanMatch has now renumbered each star associated with 
     21 # an X-ray source for simplicity
     22 display(DaoCleanMatch)

TypeError: WriteReg() got an unexpected keyword argument 'idname'

Incorporating Other Catalogs#

One of the things you will want to keep in mind is that some of the work you may find yourself doing here may have already been done by other studies. In particular, keep an eye out for cluster and SNR catalogs that have been published by other groups. If those exist, you will want to download those catalogs and extract the coordinates of clusters/SNRs that they’ve identified. You should then create a DS9 region file of those sources with WriteScript.WriteReg(). This will become important as you attempt to classify the optical counterparts of each X-ray source, as they can help inform your final classifications.

If there is one or more catalog that you can use in your own research, you will likely find it useful to cross-reference the sources in your sample to sources in those catalog. As discussed above, Crossref() takes in a DataFrame containing source coordinates (e.g. a DataFrame of your X-ray sources) and and cross-references them with sources within one or more given region file. This makes is useful not just for finding matches across filters, but across different types of source catalogs.

For example, say I have a catalog for compact clusters and SNRs in M101. To identify matches between my X-ray source catalog and both of these, I will save the cluster/SNR catalogs as DS9 region files and run Crossref() thusly:

Crossref(<CSC DataFrame>, regions=[<Name of SNR region file>,
                                   <Name of Cluster region file>], 
                         catalogs=['SNR', 'Cluster'], 
                         coordheads=['RA','Dec'], coordsys='fk5', 
                         search_radius=<radius in degrees for fk5>,
                         sourceid="CSC ID", outfile=<save filename>)

or

Crossref(<CSC DataFrame>, regions=[<Name of SNR region file>,
                                   <Name of Cluster region file>], 
                         catalogs=['SNR', 'Cluster'], 
                         coordheads=['X','Y'], coordsys='img', 
                         search_radius=<radius in pixels for img>,
                         sourceid="CSC ID", outfile=<save filename>)

This will result in a DataFrame containing the CSC ID of each source, its coordinates, and the ID of all corresponding matches from each catalog. The matches will be saved to the data file defined by outfile.

Selecting the Best Counterpart#

Selecting the best optical counterpart for each X-ray source is a multi-step process that requires some manual choices using reasonable arguments. There are several different types of objects within 2-\(\sigma\) of an X-ray source with DaoClean(), which will be covered in the next few chapters. The ‘best’ counterpart to any given X-ray source depends on what sort of object you see, as there is a heirarchy of how the sources are prioritized as the ‘best’ counterpart. In order of highest priority, this heirarchy is:

  • Background galaxies and foreground stars;

  • Clusters;

  • High-mass stars;

  • Low-mass stars, and finally;

  • Intermediate-mass stars.

Background galaxies and foreground stars are considered contaminants in any XRB catalog, so we remove those first. Since clusters are generally considered the birthplace of most XRBs, they are more likely associated with an XRB than a single isolated star. Finally, if the only counterparts of an X-ray source are a handful of stars, we argue that high-mass stars are more likely to form XRBs than low-mass stars in the disk, while low-mass stars are more likely to survive in an active XRB than intermediate-mass stars. Thus, a high-mass star is more likely to be the donor than a low-mass star, while a low-mass star is more likely to be the donor than an intermediate-mass star. In my work, it usually isn’t necessary to identify the exact donor star if all of the stars within 2-\(\sigma\) are of the same mass class (high, intermediate, or low), although some studies opt to define the donor star as the brightest star, the star closest to the center, or some combination of those factors.

So, the selection of the best optical counterpart for X-ray sources with multiple optical counterparts should be done sychronously with the steps in the next few chapters.