Indizen we have developed several own R & D & I projects that have had both public and private funding. The result of the research and development process has had a very important impact on the company and has allowed us:
- Establish an innovative culture in the company.
- Create new cutting-edge software tools that improve business processes for our client companies and for society.
- Generate new business for our shareholders and employees.
- Collaborate with other research centres.
We have developed projects for the Energy and Health sector which have been co-financed by the Ministry of Industry within the Avanza R & D Program. These tools have represented a business for the company and have allowed us to generate new jobs.
In addition to these own projects we are developing numerous projects for our client companies, following the methodology of development and management of Indizen projects.
CASES OF SUCCESS R & D
Currently, in the great majority of the technological projects that are being developed there is a great deal of processing and information analysis. This information can be owned by the company or need some enrichment from external sources. Indizen has extensive experience in obtaining data from different sources, usually unstructured data, which are transformed into structured data so that they can be stored homogeneously for subsequent enrichment and analysis.
We have worked using multiple Internet service APIs that have provided us so disparate data such as business information, weather values, telephone antenna distribution, demographic data. These data united in different layers with the data “Open Data” of the governments provide an enrichment of the own information of our clients very valuable for its analysis.
These enrichments have allowed us to design very specific commercial actions, predictions of machine breakdowns and population behaviors to improve services to the public.
We are experts in the use of information extraction frameworks and API calls in text analysis tools. We do not focus on a single tool, on the contrary, we adapt to the needs of the project and the client to obtain the best possible solution according to the data with which we must work.
Natural Language Processing
One of the challenges of the digital transformation process is to store and process the huge amount of data generated. Current databases allow you to structure, store, search and access data in a simple, standard way, with good performances and reliability. However, it is fundamental to structure and understand these data to exploit them properly.
Natural language processing (NLP) and other information analysis strategies are involved in this context. Part of the data we find are in the form of free text, generated by people in their normal work and social relationships. Using NLP we can analyze these texts at several levels to obtain syntactic and semantic information generating added value to data. Subsequently, we can infer knowledge from the information obtained using statistical methods.
In the medical field, text analysis is a key element in the clinical documentation process. On the one hand, we develop non-intrusive tools so that the doctor or other health professionals can write their reports in a correct format. In addition, we can offer recommendations based on the prior knowledge generated by the system. Subsequently, NLP tools are used to process the report in the next steps of its life cycle, which are related to aspects of coding and statistical study, among others.
One of the most striking features of the data generated in a medical center is its heterogeneity. This variability stems from the high complexity of medical and surgical protocols, which at the same time is reflected at the reporting and management level.
Addressing the study of this type of data is complicated using conventional data analysis techniques. The very nature of the data makes it inefficient in time and resources to process this data using static models.
In this sense, we use machine learning with these objectives:
• Discover patterns in both medical and administrative data to add value to the clinical documentation process.
• Generate natural language processing tools without the need labelling manually medical diagnostics and procedures.
• Review and validate medical documentation based on the recommendations of systems trained in this regard (typical cases and outliers study).
• Graph the health activity, making its visualization and management more accessible.
• Improve patient safety by discovering relationships between variables that may be coupled.
Algorithms of semantic searches
What are they and what is the importance of semantic searches?
Semantic searches seek to improve information retrieval by using categorizations of semantic concepts and networks, with the intention of disambiguating queries and returning the most relevant results. The difficulty of these searches is to emulate human behavior when establishing semantic equivalences between different expressions. Ideally, a system should return the same results, ordered by relevance, for different queries with the same semantic equivalence.
The importance of semantic searches is booming due to the appearance of Big Data and the evident increase in the volume of digitized data. A clear example of this need to optimize searches, is the Google search engine; return results extracted from the entire network in the most accurate and personalized way possible, based on a query of few words. In May of 2012, Google launched its Knowledge Graph with the intention of improving its search engine, not doing lexical searches (text strings) but semantic searches (things, concepts).
Semantic searches in health sector
At Indizen we have a wide experience in health sector (where this type of research is vital to favor health care and support for medical decision-making). Nowadays, there is a lot of clinical information from different sources and in different formats: information structured as clinical terminologies, disease classifications, etc., as well as unstructured or semi-structured information, expressed in natural language. Indeed the use of natural language is another problem when making searches, due to the huge number of synonyms and polysemic terms that exist. The rest of the terms in the query must be taken into account to obtain more information of the context and to try to disambiguate the terms.
To perform semantic searches within the health context, we will rely on the use of elements such as a metathesaurus, which includes all the concepts of interest of the different health subdomains, and a semantic network capable of hierarchizing and categorizing those concepts. The semantic network establishes relationships between the concepts defined in the metathesaurus, giving them semantic value and refining their meaning.
To enrich these structures we use different terminologies and standardized classifications in the health field such as SNOMED CT, which is the most comprehensive clinical terminology in the world, including more than 400,000 clinical concepts, hierarchical relationships and defining relationships between these concepts completing their semantics. Each concept has multiple descriptions and synonyms associated in different languages, thus forming a vocabulary for the clinical domain.
Currently an effort is under way to extend and enrich these tools by incorporating new information resources and standardizing their content. In this way, we are working on the generalization of concepts collected in the metathesaurus so that they can be related to other terminological classifications of wide dissemination, such as the one established by the National Library of Medicine in the categorization of UMLs (Unified Medical Language System).
BigData: Data Governance & Data Quality
THE IMPORTANCE OF DATA QUALITY:
Today, information has become a key organizational resource, data is the raw material, and a good management of them is becoming a key factor for organizations. At Indizen we ask: you: Are you working with data quality? It is really a relevant aspect to companies?
The quality of data is a key in the projects, we can not forget that:
Data by themselves do not generate business. YOUR MANAGEMENT DO.
At Indizen we work on our developments taking into account that data, without quality, reduces its value for companies, in order to have a system with quality of information that shows the real situation of companies, and that allows the user to take proper control of it.
Within the management of information, there are disciplines with a mission interlinked each other to manage the data as a valuable resource.
Within these disciplines we will speak fundamentally of two of them, very important in the world of information management: Data Governance and Data Quality.
DATA GOVERNANCE: WHAT, WHEN AND WHY
Data Governance is a discipline in charge of orchestration of people, processes and technology that allows to enable information as a resource of business value, and at the same time, is responsible for keeping users, auditors and regulators satisfied, using the data quality improvement to retain customers, constituting and guiding new market opportunities.
Its objective is:
• Enable a better decision making.
• Reduce the operating load.
• Protect staff needs with interest in data.
• Protect each one of the needs of the different areas interested in data.
• Establish a set of standards, processes and policies to govern the data in a corporate level.
• Reduce costs and increase effectiveness through coordination of efforts.
• Ensure transparency of processes.
And its functions are:
• Establish goals: Key statements that guide operation and development of supply information chain.
• Define metrics: A set of measures used to evaluate the effectiveness of the program and associated governance processes.
• Decisions making: Organizational structure and ideological change model to analyse and create decision-making policies.
• Communicate policies: Tools, skills and techniques used to communicate policy decisions to the organization.
• Measure results: Compare policy results with goals, inputs, decision and communication models to provide constant feedback on policy effectiveness.
• Audit: Tool used to check everything
When do we use Data Governance? An organization needs to move from an informal management of data to Data Governance, for example, when traditional management or data systems are not able to cope with multifunctional activities related to data, or regulations, standards, compliance and requirements need a more formal Data Governance.
Principles of Data Governance:
• Control and Balance.
• Change management.
DATA QUALITY WHAT IT IS, WHAT DOES IT CONSIST OF
It is a discipline in charge of keeping organizations’ information complete, precise, consistent, updated, unique and, the most important, valid.
This discipline would be present in the phases of data analysis, or it would also be an important aspect to generate monitoring indicators of data quality, allowing us to know day by day if we deviate from our objective.
The first step before acquiring any Data Quality tool is detecting and prioritizing focus areas such as critical data elements to identify what is critical for business or simply the data value, that is not otherwise the associated risk of poor data quality, once identified where we want to concentrate our data quality efforts we continue to the use of the tool.
DATA QUALITY- INFORMATION-DECISION MAKING
THE 6 KEY PROCESSES OF DATA QUALITY:
1. Discovery: exploring models and data sources undocumented, and find quickly the measurement and identification of these.
2. Define: it is a data audit, which identifies and quantifies data quality problems within all sources, its objective is to generate a tangible measure of data quality at the beginning, and find if there are duplicate data sources, redundant attributes, etc.
3. Cleanliness: This process defines the rules and goals to be achieved.
4. Coincidences: it consists of defining and designing the cleaniness, standardization and consolidation rules.
5. Consolidation: in this stage, improvement processes in the data quality are implemented, which we defined previously in the cleaning process (point 3).
6. Monitoring: Once all processes have been implemented, it generates reports with the results obtained, including improvements and alerts.