In this post I apply t-SNE technique (https://lvdmaaten.github.io/tsne/) to vizualize X-ray emission line data from Supernova Remnant sources observed by Suzaku telescope. The motivation is to provide a usefull insight into a complicated dataset using visualization method without going into deep detail about the nature of the data at hand.
Supernova remnant (SNR) (https://en.wikipedia.org/wiki/Supernova_remnant) is what has left after a star has exploded in a supernova explosion. In essence it is a large cloud of hot plasma with temperature of millons of degrees. Young SNRs (less than ~10000 years old) are bright in X-rays. Their spectra usually are very complex showing many emission lines for various elements that have been produced during the explosion, the most common being magnesium, silicon, sulphur. X-ray emission lines present an important tool for study many aspects of supernova physics. Study this kind of data requires an understanding of advanced topics in astrophysics such as plasma physics, relativity, nuclear physics and others.
However, for this post we only need to keep in mind the following facts about SNR sources. First, SNRs are loosely divided into two broad catagories depending on the type of an explosion. When an exploded star was a white dwarf, the supernova does not leave any stellar object after exposion. A star is completely destroyed, leaving only a nebular behind. Wnen a more massive star undergoes supernova stage, it's core collapses into a compact object, either neutron star or a black hole, which resides inside SNR nebula. For better understanding of varius aspects of stellar and galaxy evolution, as well as, for cosmology it is important to classify SNRs into their respective projenitor type. This is not a trivial task as the compact star is not always observed. Therefore, some SNRs have they type determined, while for some SNRs the type is unknown.
We use the "Suzaku"(https://heasarc.gsfc.nasa.gov/docs/astroe/) X-ray telescope spectral data on SNRs. There are 408 individual spectra. The spectra are modeled as sum of power law continuum and a subset of 56 X-ray spectral lines. A "subset" means that a particular spectrum usually has much less than 56 lines detected.
# this code is downloaded from https://lvdmaaten.github.io/tsne/
# and edited to work in Python 3
import numpy as Math
import pylab as Plot
from tsne import tsne
The data is a set centroid energy, sigma and equivalent width for 56 Gaussians describing individual emission X-ray lines, i.e. there are 168 values per row. Again, only a subset of line features have non-zero values. Data for lines which has not been detected are filled with zeros, because the t-SNE technique can't process empty (NAN) values. There are 408 rows, i.e. the data rpresents 408 individuial spectra.
import pandas as pd
lines = pd.read_pickle('for_sne.pkl')
colindex = lines.columns.values
def funpsr(x):
if x == 0:
return 0
return 1
cols = [c[1] > 0 for c in colindex]
#print(cols)
X = lines.ix[:,cols].values
#print(lines.index)
labels = [lines.index.levels[3].values[l] for l in lines.index.labels[3].values()]
sources = [lines.index.levels[4].values[l] for l in lines.index.labels[4].values()]
psrs = [funpsr(l) for l in lines.index.labels[5].values()]
obsids = [lines.index.levels[1].values[l] for l in lines.index.labels[1].values()]
#psrs
Y = tsne(X,2,50,30.0)
Interactive: hovering over a point allows to see observation ID and the source. Green shows Type I and red shows Type II SNRs. Gray point are SNRs of unidentified type.
from bokeh import mpl
from bokeh.charts import Histogram, output_notebook, show, ColumnDataSource, defaults
from bokeh.models import HoverTool
from bokeh.plotting import figure
cols = ["gray","green","red"]
colors = [cols[x] for x in labels]
sz = Math.array(psrs,dtype=int)*10+5
source = ColumnDataSource(data=dict(x=Y[:,0],y=Y[:,1],name=sources,obsid=obsids,color=colors,size=sz))
hover = HoverTool(tooltips=[("Object","@name"),("Obsid","@obsid")])
p = figure(title="",width=800,x_range=[-60,60],y_range=[-60,60],height=600,tools=[hover,"pan","wheel_zoom","box_zoom","reset"])
p.xaxis.axis_label = "X"
p.yaxis.axis_label = "Y"
p.circle('x','y',color='color',size='size', fill_alpha=0.2,source=source)
output_notebook()
show(p)
The Type I SNRs (green) are grouped in a separate cluster with a few exeptions. Core collapsed SNR are distributed into much broader set of groups, as expected from a "zoo" nature of this SNR Type. Bigger circles represent observations of the SNR, which harbor a pulsar. This strongly points to a core collapse type.
t-SNE is able to effectively separate SNR observations based on the SNR type. A separate cluster of green data points presenting Type I SNRs is seen at the top rigth part of the figure, though some exclusions are observed. Perplexity setting of about 20-30 is optimal for this analysis. The above result is base on the analysis of the entire set of line features. Selecting particular lines or element spicies can possible produce more clean separation.