Identifying Optical Counterparts to X-ray Sources

Identifying Optical Counterparts to X-ray Sources#

From this point on, the methodolgy for identifying the best candidate counterpart for each XRB and becomes fairly open ended. While there are still specific steps that need to be taken, there are a countless number of ways to do so! Here, I detail the steps I typically take to optimize my workflow. The details of one’s workflow do not matter as long as you accomplish the following:

Identify optical counterparts that fall within 2-\(\sigma\) of each X-ray source;
Within this population, identify contaminants (i.e. foreground stars, background galaxies), clusters, and point-sources (i.e. candidate donor stars);
Extract the photometry from clusters and point-sources (AKA stars);
Estimate the masses of point-sources and ages of clusters to determine the most likely mass classification (low, intermediate, or high) of each XRB; and
Flag possible supernova remnants (SNR).

This chapter covers step 1. The next chapter, Classifying Optical Counterparts, will cover step 2 and 5. Estimating XRB Masses with CMDs and CCDs covers steps 3 and 4.

Isolating Candidate Counterparts with `DaoClean`#

Now that we’ve identified all HST optical point sources with XRBID.AutoPhots.RunPhots() or photutils.detection.DAOStarFinder() and calculated the best positions and 2-\(\sigma\) radii of each CXO X-ray source, we can finally select the candidate optical counterparts for each XRB. This is done by isolating all sources that fall within the 2-\(\sigma\) radius of each X-ray source. One can do so manually, but the easiest way is to run Sources.DaoClean().

DaoClean() works by taking the coordinates of each source in a DataFrame (read in as sources) and searching for sources in a second DataFrame (read in as daosources) that fall within a given radius. It returns a DataFrame equivalent to that read in as daosources, but only containing sources that fall within the search radius of a sources source. So, for XRBs in this example, the DataFrame given as sources would contain each X-ray source, their coordinates, and their calculated 2-\(\sigma\) radii. The DataFrame given as daosources, on the other hand, contains the point sources identified by RunPhots() (or DAOStarFinder()) in your base filter of choice.

Before running DaoClean(), you will need to place the coordinates of each RunPhots()/DAOStarFinder() point source into a new DataFrame. This can be done by using DataFrameMod.BuildFrame() or Sources.LoadSources() on the .ecsv data file that was produced in the previous chapter (by default, RunPhots() saves this as something similar to photometry_M101_f555w_acs_full.ecsv), but only if you did not apply an additional shift to your region file. In my case, I decided to apply a shift during the RunPhots() phase, because the region files that were produced did not align well to the HST images for reason unknown.

Because my region files are shifted, and because my X-ray coordinates were aligned to these shifted regions, I need to create a DataFrame using the coordinates of those shifted region files. While we’re at it, one may find it’s a good idea to compile both the img and fk5 coordinates into a single DataFrame, for greater flexibility later down the line.

from XRBID.Sources import GetCoords, GetIDs
from XRBID.DataFrameMod import BuildFrame

# Pulling coordinate and ID information from region files
x,y = GetCoords("../testdata/M101_daofind_f555w_acs_img.reg")
ids = GetIDs("../testdata/M101_daofind_f555w_acs_img.reg")
ra,dec = GetCoords("../testdata/M101_daofind_f555w_acs_fk5.reg")

# Compiling into a single DataFrame
DaoFrame = BuildFrame(headers=['DaoID','X','Y','RA','Dec'], 
                      values=[ids,x,y,ra,dec])
DaoFrame.to_csv("../testdata/M101_daofind_acs_coords.frame")

# To read in and check the .frame file saved above:
import pandas as pd 

# I choose to use pd.read_csv here because of how big this file is
DaoFrame = pd.read_csv("../testdata/M101_daofind_acs_coords.frame").drop("Unnamed: 0", axis=1)
display(DaoFrame)

	DaoID	X	Y	RA	Dec
0	1	7710.069326	3655.196224	210.888379	54.225405
1	2	7712.935688	3688.774732	210.888312	54.225872
2	3	7926.745455	3696.670351	210.883232	54.225984
3	4	7740.506306	3722.354818	210.887658	54.226338
4	5	7802.588601	3728.903381	210.886183	54.226430
...	...	...	...	...	...
178097	178098	8939.411035	19364.831601	210.859372	54.443607
178098	178099	9082.172967	19379.774137	210.855963	54.443815
178099	178100	8920.492988	19385.066555	210.859824	54.443888
178100	178101	8947.054147	19392.263104	210.859190	54.443988
178101	178102	9038.069684	19406.788855	210.857016	54.444190

178102 rows × 5 columns

Important

DaoClean allows the user to attempt to do the radius search using the fk5 (degree) coordinate system. However, because of how this system works and the image is stretched, you’ll find that the search may return unexpected results (e.g. returning DaoClean sources within an ellipse around XRBs instead of a circle). It is recommended that one used the img coordinate system instead, which returns more predictable results.

As noted above, it is best to run DaoClean() using the img coordinate system, which gives coordinates and radii in pixel units relative to the reference image, so we will also need to get the img coordinates of the X-ray sources (which up until this point have been plotted in fk5 coordinates).

In DS9, open one of the region files containing the corrected coordinates of the X-ray sources and resave it in DS9 using image coordinates. Then read in the image coordinates and save it to your X-ray source DataFrame. Here, I saved my new region file as M101_bestrads_2sig_img.reg.

from XRBID.Sources import LoadSources, GetCoords

# Loading in the original DataFrame containing the best 2sig radii for each source
M101_best = LoadSources("../testdata/M101_csc_bestrads.frame")

# Reading in the newly-saved img region file
xsources, ysources = GetCoords("../testdata/M101_bestrads_2sig_img.reg")
# Adding the image coordinates to the DataFrame
M101_best["X"] = xsources
M101_best["Y"] = ysources

# Convert the radius from arcseconds to pixels. 
# The conversion is 0.05 for ACS/WFC and 0.03962 for WFC3/UVIS
M101_best["2Sig (pix)"] = M101_best["2Sig"] / 0.05 

# Saving the changes to the DataFrame
M101_best.to_csv("../testdata/M101_csc_bestrads.frame")
display(M101_best)

Reading in sources from ../testdata/M101_csc_bestrads.frame...

Retrieving coordinates from ../testdata/M101_bestrads_2sig_img.reg

	Separation	CSC ID	RA	Dec	ExpTime	Theta	Err Ellipse Major	Err Ellipse Minor	Err Ellipse Angle	Significance	...	MS Ratio lolim	MS Ratio hilim	Counts	Counts lolim	Counts hilim	1Sig	2Sig	X	Y	2Sig (pix)
0	0.835778	2CXO J140312.5+542056	210.802227	54.348952	98379.672624	1.280763	0.296164	0.295254	89.606978	22.302136	...	-0.616490	-0.485322	234.492581	216.678502	252.306660	0.493011	0.988256	11333.8500	12549.3440	19.765113
1	1.951564	2CXO J140312.7+542055	210.803345	54.348663	49085.676020	4.725025	0.548072	0.380093	86.863933	2.648649	...	-1.000000	-0.715178	209.970149	192.843759	227.096539	0.516471	1.038077	11286.9600	12528.5350	20.761533
2	2.480949	2CXO J140312.5+542053	210.802221	54.348072	98379.672624	1.321652	0.296164	0.295733	84.016491	19.554042	...	-0.364147	-0.265459	252.846172	234.295673	271.396671	0.492957	0.988086	11334.1110	12486.0170	19.761726
3	5.227586	2CXO J140313.1+542052	210.804553	54.347994	132129.439441	2.762119	0.880650	0.578584	29.467809	3.421053	...	-0.973766	-0.718926	23.229213	16.216620	30.241806	0.561749	1.188360	11236.2880	12480.3160	23.767200
4	9.124404	2CXO J140313.5+542053	210.806667	54.348188	98379.672624	1.410741	0.633314	0.466131	58.539300	1.739130	...	-1.000000	-0.370394	7.040343	2.992146	11.088540	0.593981	1.300029	11147.5470	12494.3280	26.000572
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
550	883.411465	2CXO J140138.8+541527	210.411860	54.257747	132129.439441	12.087047	8.647887	6.103176	111.468541	4.277778	...	-0.932542	-0.617739	76.034629	52.273807	99.795450	1.921653	4.517698	27753.7720	6032.4759	90.353960
551	884.529187	2CXO J140450.6+541721	211.211172	54.289330	139822.610581	9.951487	1.357801	1.057531	95.305118	8.976584	...	-0.472829	-0.270456	155.562531	135.448493	174.718757	0.793067	1.558488	-5851.2446	8301.6458	31.169764
552	886.380054	2CXO J140216.3+543313	210.567813	54.553731	131222.705704	13.887674	2.480733	2.445301	115.654637	6.783292	...	-0.322923	-0.118051	149.738677	121.454705	176.358886	1.676834	3.103807	21117.9750	27312.5690	62.076131
553	890.405834	2CXO J140445.4+541452	211.189426	54.247897	139819.410649	10.703677	4.581313	2.651713	135.472019	4.411765	...	0.501562	1.000000	NaN	NaN	NaN	0.487466	0.974933	-4952.8736	5313.5560	19.498651
554	892.520732	2CXO J140447.2+542633	211.196627	54.442543	14276.493764	3.416463	1.612100	0.795238	86.007749	2.650000	...	NaN	NaN	7.591890	4.313574	10.870206	0.738103	1.710737	-5181.2993	19329.5700	34.214743

555 rows × 33 columns

Now everything should be ready to run through DaoClean() using img coordinates:

from XRBID.Sources import DaoClean
from XRBID.WriteScript import WriteReg

DaoCleanFrame = DaoClean(daosources=DaoFrame, sources=M101_best, sourceid="CSC ID", 
                         coordsys="img", coordheads=['X','Y'], radheader="2Sig (pix)") 


# Renaming the DaoID header to "F555W ID", to match the output of the other filters
DaoCleanFrame = DaoCleanFrame.rename(columns={'DaoID': 'F555W ID'})

display(DaoCleanFrame)

WriteReg(DaoCleanFrame, coordsys='fk5', width=2, 
         outfile='../testdata/M101_daoclean_f555w_acs_fk5.reg')

Cleaning DAOFind sources. This will take a few minutes. Please wait..

DONE WITH CLEANING. CREATING DATAFRAME...

	F555W ID	X	Y	RA	Dec	CSC ID
0	101364	11331.067786	12533.111216	210.802294	54.348726	2CXO J140312.5+542056
1	101394	11327.699419	12536.055626	210.802374	54.348767	2CXO J140312.5+542056
2	101442	11344.963322	12537.104504	210.801963	54.348782	2CXO J140312.5+542056
3	101505	11323.312878	12538.940418	210.802478	54.348807	2CXO J140312.5+542056
4	101528	11332.293329	12541.942013	210.802264	54.348849	2CXO J140312.5+542056
...	...	...	...	...	...	...
1266	3143	5443.850802	6942.044065	210.942354	54.271014	2CXO J140346.1+541615
1267	52656	3436.981758	10161.470633	210.990268	54.315671	2CXO J140357.6+541856
1268	52713	3445.711155	10162.184256	210.99006	54.315681	2CXO J140357.6+541856
1269	68	6813.402988	4150.018831	210.909697	54.232264	2CXO J140338.3+541355
1270	307	4105.0065	4868.306051	210.97409	54.242176	2CXO J140353.7+541430

1271 rows × 6 columns

Saving ../testdata/M101_daoclean_f555w_acs_fk5.reg
../testdata/M101_daoclean_f555w_acs_fk5.reg saved!

Check the results of the search by plotting both the 2-\(\sigma\) radii of X-ray sources and the DaoClean sources on the HST image on DS9. If it looks like the search is missing sources as though the search radius is too small, you can modify the DaoClean() parameters to increase the search radius with wiggle (in pixels or degrees). This will make the search a little less strict, so you can include sources a whose centroids fall just outside of the 2-\(\sigma\) radius of each X-ray source, if needed.

Cross-referencing sources across filters#

The DaoClean() procedure above tells us which F555W source falls within 2-\(\sigma\) of an X-ray source. Next, we need to figure out what each of these F555W sources is called in the other filters, as identified by RunPhots() in {ref}(sec:runphots), in order to pull the appropriate photometries from each of the photometric data files. This can be done with XRBID.Sources.Crossref().

Unlike DaoClean(), Crossref() takes in a DataFrame containing at least the source ID and coordiates and cross-references them with sources within one or more given region file. The region files are read in as a list under the regions parameter. You can define the name of each catalog the region files represent with the catalogs parameter; this defines the headers under which Crossref() will save the cross-referencing results for each region file. (See Incorporating Other Catalogs below for a more general look at using Crossref())

Note

Sources.Crossref() isn’t actually very different, functionally, from Sources.DaoClean(), except Crossref will search through multiple region files (AKA catalogs) at once and add as many extra lines to the DataFrame as needed to accommodate all possible associations. Also, Crossref will only keep information on the coordinates and ID of each source, and the IDs of sources that may be associated with it in the other catalogs. It is not recommended to use Crossref instead of DaoClean to find daosources associated with each XRB, because several sources may be found within each XRB 2-\(\sigma\), and they way the IDs of point sources in each catalog are organized may give the false impressions that they are associated with one another across catalogs (e.g. in a single line in the resulting DataFrame, you could have an F555W ID from one nearby star, an F435W ID from another, and an F814W from a third; test it out using both Crossref and DaoClean to see what I mean). This is only mitigated by reducing the search radius, which I do in Crossref below.

from XRBID.Sources import Crossref

# The ID of each F555W source is saved in DaoCleanFrame under the header 'DaoID'
DaoCleanMatch = Crossref(DaoCleanFrame, 
                         regions=['../testdata/M101_daofind_f435w_acs_img.reg',
                                  '../testdata/M101_daofind_f814w_acs_img.reg'], 
                         catalogs=['F435W', 'F814W'], coordheads=['X','Y'], 
                         sourceid="F555W ID", outfile="../testdata/M101_daoclean_matches.frame")

DaoCleanMatch

Finding cross-references between sources. This will take a few minutes. Please wait.. 

DONE WITH CLEANING. CREATING DATAFRAME...

	X	Y	F555W ID	F435W ID	F814W ID
0	11331.067786	12533.111216	101364	143088	NaN
1	11327.699419	12536.055626	101394	143125	148555
2	11344.963322	12537.104504	101442	143189	148645
3	11323.312878	12538.940418	101505	143293	NaN
4	11332.293329	12541.942013	101528	143332	NaN
...	...	...	...	...	...
1397	5443.850802	6942.044065	3143	4807	6523
1398	3436.981758	10161.470633	52656	74953	71911
1399	3445.711155	10162.184256	52713	74989	72017
1400	6813.402988	4150.018831	68	105	100
1401	4105.006500	4868.306051	307	NaN	NaN

1402 rows × 5 columns

Note, it’s possible that each source above is associated with more than one source in one of the catalogs (region file) its being cross-referenced with. For example, a source that’s saturated in one filter may appear as multiple sources in a filter with a shorter exposure time. Crossref() keeps track of every possible association, so it’s likely that DaoCleanMatch ends up longer than DaoCleanFrame. Keeping the search radius low (the default is 3 pixels) is one way to prevent this as much as possible, which is what we’d want in this case.

It’s also useful for the next step to add the CSC ID back onto the DataFrame, which can be done as so:

from XRBID.DataFrameMod import Find 

# Adding "CSC ID" to columns in DaoCleanMatch
DaoCleanMatch["CSC ID"] = None

# Adding the CSC IDs of each source the daosources are associated with
for i in range(len(DaoCleanMatch)): 
    # Searches for the CSC ID associated with each F555W ID (DaoID in DaoCleanFrame)
    tempid = DaoCleanMatch["F555W ID"][i] 
    tempcsc = Find(DaoCleanFrame, "F555W ID = " + str(tempid))["CSC ID"][0]
    DaoCleanMatch["CSC ID"][i] = tempcsc
    
# Updating the DataFrame
DaoCleanMatch.to_csv("../testdata/M101_daoclean_matches.frame")
display(DaoCleanMatch)

	X	Y	F555W ID	F435W ID	F814W ID	CSC ID
0	11331.067786	12533.111216	101364	143088	NaN	2CXO J140312.5+542056
1	11327.699419	12536.055626	101394	143125	148555	2CXO J140312.5+542056
2	11344.963322	12537.104504	101442	143189	148645	2CXO J140312.5+542056
3	11323.312878	12538.940418	101505	143293	NaN	2CXO J140312.5+542056
4	11332.293329	12541.942013	101528	143332	NaN	2CXO J140312.5+542056
...	...	...	...	...	...	...
1397	5443.850802	6942.044065	3143	4807	6523	2CXO J140346.1+541615
1398	3436.981758	10161.470633	52656	74953	71911	2CXO J140357.6+541856
1399	3445.711155	10162.184256	52713	74989	72017	2CXO J140357.6+541856
1400	6813.402988	4150.018831	68	105	100	2CXO J140338.3+541355
1401	4105.006500	4868.306051	307	NaN	NaN	2CXO J140353.7+541430

1402 rows × 6 columns

Checking `Crossref()` results#

It’s a good idea check the results of Crossref() in DS9 by plotting the daofind region files for each filter over the full-color image of the galaxy. If there are some obvious bad matches, you can remove or change them manually in the DataFrame save file created above.

One easy way to do so is to create a bash script that saves images of each CSC source with the region files plotted. You can then go through each image, compare the daofind source ID’s associated with each filter to that listed in your *_daoclean_matches.frame file, and adjust as needed. I opt to remove any spurious sources (e.g. false detections) from my .frame file, while I’m at it.

The function to create the bash script is XRBID.WriteScript.WriteDS9(), which is described in {ref}(chap:sourceids). Once the script is created, you can open an xterm terminal with xterm & in your command line, and then running bash <bashscriptname>.sh.

I will not go through these steps here, but it’s a good idea to incorporate then in your own workflow.

Once you’re happy with your matches, it’s a good idea to renumber the candidate counterparts, so that you’re not relying on the long Dao IDs. This will make the imaging and donor identification steps we do later a little easier.

# For each X-ray source, number the associated stars starting from 1.
tempcsc = DaoCleanMatch["CSC ID"][0] # Keeping track of the current X-ray source
DaoCleanMatch["StarID"] = None # Storing the star number for each X-ray source

starno = 1 # Star number counter

for i in range(len(DaoCleanMatch)): 
    # If this is a new CSC ID, save it to tempcsc and restart starno counter
    if tempcsc != DaoCleanMatch["CSC ID"][i]: 
        tempcsc = DaoCleanMatch["CSC ID"][i]
        starno = 1
    DaoCleanMatch["StarID"][i] = starno
    starno += 1 # adds one to the starno counter

# Saving the candidate region file with new numberings
WriteReg(DaoCleanMatch, coordsys="image", coordnames=["X","Y"], 
         idname="StarID", width=2, fontsize=12,
         showlabel=True, outfile="../testdata/M101_XRB_candidates.reg")

# DaoCleanMatch has now renumbered each star associated with 
# an X-ray source for simplicity
display(DaoCleanMatch)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 16
     13     starno += 1 # adds one to the starno counter
     15 # Saving the candidate region file with new numberings
---> 16 WriteReg(DaoCleanMatch, coordsys="image", coordnames=["X","Y"], 
     17          idname="StarID", width=2, fontsize=12,
     18          showlabel=True, outfile="../testdata/M101_XRB_candidates.reg")
     20 # DaoCleanMatch has now renumbered each star associated with 
     21 # an X-ray source for simplicity
     22 display(DaoCleanMatch)

TypeError: WriteReg() got an unexpected keyword argument 'idname'

Incorporating Other Catalogs#

One of the things you will want to keep in mind is that some of the work you may find yourself doing here may have already been done by other studies. In particular, keep an eye out for cluster and SNR catalogs that have been published by other groups. If those exist, you will want to download those catalogs and extract the coordinates of clusters/SNRs that they’ve identified. You should then create a DS9 region file of those sources with WriteScript.WriteReg(). This will become important as you attempt to classify the optical counterparts of each X-ray source, as they can help inform your final classifications.

If there is one or more catalog that you can use in your own research, you will likely find it useful to cross-reference the sources in your sample to sources in those catalog. As discussed above, Crossref() takes in a DataFrame containing source coordinates (e.g. a DataFrame of your X-ray sources) and and cross-references them with sources within one or more given region file. This makes is useful not just for finding matches across filters, but across different types of source catalogs.

For example, say I have a catalog for compact clusters and SNRs in M101. To identify matches between my X-ray source catalog and both of these, I will save the cluster/SNR catalogs as DS9 region files and run Crossref() thusly:

Crossref(<CSC DataFrame>, regions=[<Name of SNR region file>,
                                   <Name of Cluster region file>], 
                         catalogs=['SNR', 'Cluster'], 
                         coordheads=['RA','Dec'], coordsys='fk5', 
                         search_radius=<radius in degrees for fk5>,
                         sourceid="CSC ID", outfile=<save filename>)

or

Crossref(<CSC DataFrame>, regions=[<Name of SNR region file>,
                                   <Name of Cluster region file>], 
                         catalogs=['SNR', 'Cluster'], 
                         coordheads=['X','Y'], coordsys='img', 
                         search_radius=<radius in pixels for img>,
                         sourceid="CSC ID", outfile=<save filename>)

This will result in a DataFrame containing the CSC ID of each source, its coordinates, and the ID of all corresponding matches from each catalog. The matches will be saved to the data file defined by outfile.

Selecting the Best Counterpart#

Selecting the best optical counterpart for each X-ray source is a multi-step process that requires some manual choices using reasonable arguments. There are several different types of objects within 2-\(\sigma\) of an X-ray source with DaoClean(), which will be covered in the next few chapters. The ‘best’ counterpart to any given X-ray source depends on what sort of object you see, as there is a heirarchy of how the sources are prioritized as the ‘best’ counterpart. In order of highest priority, this heirarchy is:

Background galaxies and foreground stars;
Clusters;
High-mass stars;
Low-mass stars, and finally;
Intermediate-mass stars.

Background galaxies and foreground stars are considered contaminants in any XRB catalog, so we remove those first. Since clusters are generally considered the birthplace of most XRBs, they are more likely associated with an XRB than a single isolated star. Finally, if the only counterparts of an X-ray source are a handful of stars, we argue that high-mass stars are more likely to form XRBs than low-mass stars in the disk, while low-mass stars are more likely to survive in an active XRB than intermediate-mass stars. Thus, a high-mass star is more likely to be the donor than a low-mass star, while a low-mass star is more likely to be the donor than an intermediate-mass star. In my work, it usually isn’t necessary to identify the exact donor star if all of the stars within 2-\(\sigma\) are of the same mass class (high, intermediate, or low), although some studies opt to define the donor star as the brightest star, the star closest to the center, or some combination of those factors.

So, the selection of the best optical counterpart for X-ray sources with multiple optical counterparts should be done sychronously with the steps in the next few chapters.