BREAKTHROUGH levels of false positive reduction: what does it mean for your innovation and ROI

In two recently released analysis we demonstrated a never before achieved reduction of false positives by use of our product ReSurfX::vysen on a data that has been extensively studied to develop numerous analytic technologies. That data is from large-scale gene expression technology analysis – Read on.. don’t let that field of application sway your interest. This never before possible level of accuracy was made possible by a breakthrough data-source agnostic 'Adaptive Hypersurface Technology' (AHT) that ReSurfX is leveraging in data-analytics and continuing to develop. Here, we discuss the state-of-the-art in data-analytics, current thoughts and problems (from popular media and authoritative source). What does it mean for your innovation and ROI?

This result shown is using a dataset that has been extensively studied by many top knowledge leaders. How then is such unprecedented level of accuracy possible?

AHT is a novel concept that fundamentally differs from most analytics used in that it is not primarily driven by statistical concepts behind p-value.

Now, given that AHT is data-source agnostic it can bring value to data analytics initiatives at various places in your workflow and for many operational steps and goals of your organization.


Isn’t p-value and statistics a very well established concept for data analytics and shouldn’t we able to use improved large scale computing and storage ability and achieve such results?

Yes p-value based statistics is well established and fundamental to lot of analysis we do. But its applications are fraught with errors. To know more about the problems, misuse and pitfalls of that approach check out these resources (videos, articles) from popular media to highly technical guidance statement from THE leadership organization for statistics.


John Oliver comedic take on the serious problem of p-hacking  (VIDEO)


Vox magazine article on a debate about p-values and how to fix


American Statistical Association statement and guidance on use of p-value


 Brief detour to help technical leaders to get into the flow of this article:

One characteristic that is very specific to the  the data used in the above results will indicate that AHT to be a broad solution applicable to most problems we solve in analytics (to create and analyze data). Briefly, both the gene expression technologies used here measures every gene in the organism (say human) using ‘multiple independent measures’ for each gene. Due to difficulties in implementing and limitations with most powerful approaches, these multiple measures are usually summarized into a single value for each gene. The difference is stark: the example above is like using many lifestyle and genetic reasons that can increase (or decrease) your longevity IN CONTRAST TO summarizing all of them using a model to a single value (something like your credit score) and predicting your longevity. The power of the first of the two approaches in previous sentence is obvious. This is where AHT brings a novel and extensible approach to combine and use many different parameters to determine outcomes of value to data analytics.

Finally, for specialists in life sciences and healthcare –if you can accomplish this level of accuracy from just your gene expression data by the use of ReSurfX::vysen (simplistically speaking), the value you get is enormous. A very powerful use case is a recent article published in Science magazine (journal) where they demonstrated that gene expression can be used to predict outcomes in cancer patients.

You may instantly relate the power of AHT to hundreds of other applications you are working on now.

What is the direct impact of False Positives on your ROI?

One of our colleagues wrote a nice outline on this recently: Positive results but Negative returns: Analytics challenges in the Life Sciences industry.

To give one example, let us say that in testing if one has a particular cancer based on their symptoms the results were ‘no cancer’ – when they actually have an early stage cancer that can be cured if treated at that stage. That is a false positive if you are testing for ‘does this patient have cancer’. This example covers the patient – AT THE CENTER – their doctor and medical institution being able to come up with the right outcome prediction, treatment plan and choice of product applicable to their disease among those from different drug development companies, and ability to predict quickly and early how those are working so treatment plan can be changed as needed. We shouldn’t forget the payers (insurance company and the patient) to make healthcare effective and cost structure sustainable.

Being able to identify and reduce the amount of errors can lead to tremendous cost savings and thus improve your ROI and improved time to market (when it comes to healthcare – the ROI extends to the society) and help drive innovation, as these errors also have a cumulative effect on your enterprise goals.


So, why do you have a terabyte or more of error in your workflow confounding your decisions and outcomes? What is special now and here? How does it relate to ongoing developments over the last decade or more?

These days the ability to collect, store, compute and predict based on that for your ongoing business or research efforts is amazing. The first level of needs after data collection is infrastructure needs for storage and compute. One of the next major changes needed to harness big data should happen in analytics (to predict outcomes of interest).

Fundamental to most recent developments in use of Big Data is ‘Moore’s Law’ that predicted our ability to double compute every year by shrinking the size of transistors (basic units used in computer chips) to half. We seemed to have achieved that and surpassed what can be done with shrinking transistor size.

See here for a nice review by MIT Technology Review magazine: Moore’s Law Is Dead. Now What?

Another major development that came about through large scale use and standardization is the commoditization of ‘infrastructure’ for use of large volumes of data. Key example of commoditization are cloud based solutions where you can rent the amount of resources you need and increase or decrease based on emerging needs. Of course, you need to pay attention and have appropriate technical prowess to handle the security among other needs.

With these advancements comes your ability to collect and use lot of data. But… if you use just one petabyte of data and have one in thousand (1/1000) errors, your workflow now has a terabyte of errors. Often the errors are much higher than this – despite each analytics claiming far lower errors. Based on our extensive calculations while comparing solutions we provide, data and analytics outcome in your workflow has 30% error: this includes fundamental measurement errors, errors in your analytics and those originating from databases you use from other vendors in the process of trying to gain insights from your data.


Coming back to the gene sequencing analysis experiment - what is special here when experts from various arena have been working on this for a decade or two – depending on which platform you are talking about?

ReSurfX::vysen leverages AHT, which is a novel technology that is not built with p-value based statistics as a dominant underlying principle. Thus it is able to overcome many shortcomings pointed out in the video and articles talking about p-value and p-hacking earlier on. ReSurfX::vysen is an enterprise-grade cloud based software built by experts who have taken care of the security and compliance needs required for effective use in large enterprises including our primary target of life sciences and healthcare applications. Ask for a DEMO, or SIGNUP for one of our tiers of ReSurfX::vysen product offerings to check this out to your satisfaction.

ReSurfX::vysen gave an unprecedented result that has not been achieved by any other published or commercially available analytics. On an extensively studied dataset generated by  sequencing and microarray quality control consortium (SEQC & MAQC) generated using RNAseq and another microarray technology vysen identified ‘ZERO’ false positive differentially expressed gene from about 20 million calls in 360 comparisons each involving over 50,000 genes.

We will get to more on false negatives and impacts in an upcoming post.

Such powerful False positive control through Big data analytics that is built from novel and robust solutions outside the standard toolbox of most sophisticated analytics is one powerful value ReSurfX offers to your enterprise.


What can you conclude from an outline above in the context of problems, solutions, and outcome we offer and highlight?

The field of Big Data analytics has enormous potential to improve innovation and ROI in your organization. In principle, it becomes a bonanza, like finding a key lead candidate for drug development or detecting and treatment of a cancer to a mid-sized enterprise.  But enormous errors are wasting your man power allocation, resource utilization in general, and confounding the ability of your experts to use these advances and their knowledge well to serve the purpose of your organization.

We are helping you in this context to solve many of these problems in unique ways helping your organization leverage the experts and expertise effectively. We do this through innovative solutions that fall outside of the well understood and currently practiced framework for analytics, as well as innovatively modifying existing tools in our products and other value offerings. Ah.. we almost forgot to mention our customers’ comment – ‘how nice the user experience is built in ReSurfX::vysen for a heavy duty analytics platform’.

