Monday, October 16, 2017

Localization Processor

A friend of mine who goes by the alias Robert J Stanley (No Relation) is working on his PHD and he deals with some of the most intense work I can think of or maybe because I don't understand it lol. He has to deal with aligning and sequencing genes, mutating bacteria and cells, testing on rats and much more I'm sure, He really is a workaholic lol. Having said that even with great people there comes a time when you wished you could do your work faster and/or better. He originally came to me asking for me to make a script to automate a certain program (ImageJ) which supposedly is a standard when it comes to doing what ever it is he is doing at the moment. He showed me the process of what he had to do and I thought at first it would be a pain to write a script for this program to do everything he wanted, so instead I told him I would make an entirely new program to do everything he asked and a bit more.

The Task:
1) So what he needed to do originally in ImageJ was to essentially greyscale 2 to 3 images.
2) Next he needed to isolate meaningful structures out of it by removing the background.
3) Then came the part where he would have to take those 2or3 images and overlap them.
4) Calculate the percentage at which the two overlap which would signify localization.
*There are more detailed steps that he had to do but I don't entirely remember.

So lets make a checklist to show the process of me making this program for him:

I will be using these 2 images to demonstrate.


1) First I needed to greyscale the images. I didn't see a need for this as the images provided were already only in 1 channel each.
Like the first image only had values in the red channel and image 2 only had values in the green channel, etc. These values were 0-255 so if you look at it simply by their values per channel then its already kind of a greyscale image. So I guess that is done.

2) Next what he had to do was isolate the meaningful bits of the image by removing the background. This was going to be pretty simple I thought. What I decided to do was go through every pixel and if the pixel had a value greater than 0 I incremented a counter and added that value to a variable. I also had a variable keeping track of the highest value it came across for later use. After the loop completes I divided that value by the counter to get a final value which represents the average intensity of all the meaningful bits (bits which weren't 0) in the picture. They then looked like this:


3) His next step was to overlap them. I simply did this by filling in all the green values (which were all 0s) in the red original image with values in the original green image, I say original because these were untouched images not the images I removed the background from in step2. This produced an image with areas of only red intensities, some areas of only green intensities and potentially ares where there was red and green intensities which would give shades of yellow and yellows which lean towards green or leans towards red. This wasn't a very good image in some cases as some of the intensities were very slight and result in images which weren't all that easy to spot localization.
Like so:

Even with overlapping the 2 images where I removed the backgrounds didn't look that much better:

I mean you can generally see where it potentially localizes (I say potentially because in all fairness I have no idea what i'm talking about lol) but its not very defined, lack of resolution I guess would be the technical statement here. So I decided to increase that. Now I can't just go in and arbitrarily change values without some kind of control. I needed something which is consistent and controlled which could be applied to all the images. Well I thought there is a gap between 0 and the average value I computed in step 2, and also a gap between the high intensity value I also found in part2 and the max value possible 255, lets stretch what ever the value is for each pixel linearly with those 2 gaps in mind. Doing this made it so that the lowest intensity value after removing the background mapped to 0 and the highest intensity found is mapped to 255 and any value in between is linearly and proportionately mapped somewhere between 0 and 255. I came up with the following formula Y = (X - A) / (H - A) where H is the highest intensity value obtained in step2 and A is the average meaningful intensity value obtained in step2. Y was then set to 255 if the resulting calculation was greater than 255.
After doing this the images were much more defined in terms of color, and structures that you couldn't see before popped out:


And the overlay composed from these two images were much more telling:

Now that I had the image finalized I had to use it to compute how much of it was potentially localizing. This took me the longest time as this was mostly statistics and boy do I hate statistics. I ultimately decided to take the average of 2 methods I designed after I computed a couple things for both methods to work with.
One thing I needed to calculate was a new average. Since I was going to analyze both images and hence 2 channels I couldn't use the individual averages obtained in step2. I decided to start off by setting the initial new average to the sum of the the average computed in step2 from both images and dividing by 2. This essentially gives me the average of the averages lol. I then iterated through all the pixels of the overlay shown in step3 and for every pixel which value is greater than or equal to the new average, add that value to a variable T for later use and increment a counter variable. A new channel average was calculated by dividing T by the counter variable . I did this separately for both the green and red channel. The finalized new average was then calculated by adding the new averages from the green channel and the red channel and dividing by 2. With these new averages (RedAverage, GreenAverage, RedGreenAverage) I was then able to compute using the two methods. Both methods share a condition and that being the pixel is only considered if its green intensity value is greater than or equals to the RedGreenAverage. We use green because that is the channel of interest. We want to see if the red overlays the green.

There is a variable called GT which gets added to.
There is a variable called RT which gets added to.
If green intensity is greater than or equal to GreenAverage and if red intensity is greater than or equal to RedAverage then add 2 to GT and add 2 to RT.
If green intensity is greater than or equal to GreenAverage and if red intensity is less than RedAverage then add 2 to GT and add 1 to RT.
If green intensity is less than GreenAverrage and if red intensity is greater than or equal to RedAverage then add 1 to GT and add 1 to RT.
If green intensity is less than GreenAverage and if red intensity is less than RedAverage then add 1 to GT.

There is a variable called GT2 which gets added to.
There is a variable called RT2 which gets added to.
Add 255 to GT2 (the reason we use 255 is GT2 stores the green values and green represents the protein or the interest and thus if its there at all it should be viewed at the most importance and so 100 percent which in this context is 255.
If red intensity is greater than 0 then add red intensity to RT2.

Finally I do ((((((RT / GT) * 100) - 50) * 100) / 50) + ((RT2 / GT2) * 100)) / 2

For this particular image I got ~%82 Localization.

All that is left was to make it do batch processing of multiple pairs of images and output all the percentages to a csv file and I was done. This program took me about 2 days to program and my friend checked a bunch of times for consistency and for any false positives throughout its development. He likes to state that this program is better than imageJ and while that maybe true for him in this particular use case. I wouldn't count imageJ out of the picture as it is much more featured and I like to think that the two programs just work differently is all.



No comments: