Skip navigation

Graph Thinking:Connectivity:Connected Components

Connectivity is an important concept in graph thinking. From the math of connectivity in graph theory, we can identify communities/clusters of nodes/entities. One way of identifying communities in a graph is a connected component. In a connected component, each node in the component is reachable from any other node in the component. There can be weakly connected components and strongly connected components. Weak and strong refer to the direction of the edge/relationship, between the two nodes/entities. Weak connections are undirected edges and strong connections are directed edges.

Finding connected components in a graph database is an important first step graph analysis. Graph databases get too big and complex to visualize readily. Connected components in a complex system are hard to see. Understanding connected components helps us understand the underlying structure of the graph for further analysis. We might discover groups of entities that we didn’t know about before: a community cut off from the rest of the system, a vulnerability in our wide area network, or a special interest of expertise in a group of professionals.

A major challenge for enterprise databases is avoiding duplication of records. “The problem of detecting duplicates in a database can be described in terms of determining the connected components of an undirected graph.”1 Analyzing a bodies of published papers using connected components. “Citation graphs representing scientific papers contain valuable information about levels of scholarly activity and provide measures of academic productivity”2 To understand the influence and power of Multi-National Corporations “a complex network analysis is needed in order to uncover the structure of control and its implications.”3 The authors found a strongly connected component at the center, “in other words, this is a tightly-knit group of corporations that cumulatively hold the majority share of each other.”3

Connected components play a key role in fraud detection.4 A ring of fraudsters can be identified by a users sharing the same address or phone number and are opening a variety of credit accounts in a defined time period. In social networks, connected component searches can find groups of strongly connected people not previously identified. And connected component analysis can unlock patterns in the groups identified.

In the powerful realm of graph thinking, connectivity is an important concept to understand and apply. Modeling our data as a graph will enable the application of connected component searches to discover patterns in our data we cannot find in a relational model. Do you think about your data in terms of connectivity? Do you think about your data in terms of communities or cluster patterns? Do you want to? If you want to learn more about graph thinking please contact me and visit Graph Thinking Workshops

1-Monge, Alavaro, E., Elkan, Charles P., An efficient domain-independent algorithm for detecting approximately duplicate database records, 1997. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.8405

2-Yuan, An, Janssen, Jeannette, Milios, Evangelos E., Characterizing and Mining the Citation Graph of the Computer Science Literature, 2004. http://www.cs.toronto.edu/~yuana/research/publications/kais.pdf

3- Vitali, Stephania, Glattfelder, James B., Battiston, Stefano, The Network of Global Control, 2011. http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0025995&type=printable

4-Scifo, Estelle, Hands-On Graph Analytics with Neo4j, Packt Publishing, 2020, page 308.

Why Graph Thinking Workshops?

Graph Thinking context: Software Application Development and Scaled Agile Framework

In 1975, Fred Brooks exposed the challenges of software application development. In The Mythical Man-Month, Brooks enumerates and quantifies his observations of large software application development projects. From his work, we can distill a powerful idea to help us estimate the effort required to deliver software applications:
Coding is secondary to design, communication, and testing.

For some years I have been successfully using the following rule of thumb for scheduling a software task:
1/3 planning
1/6 coding
1/4 component test and early system test
1/4 system test, all components in hand.
This differs from conventional scheduling in several important ways:
1. The fraction devoted to planning is larger than normal. Even so, it is barely enough to produce a detailed and solid specification, and not enough to include research or exploration of totally new techniques.
2. The half of the schedule devoted to debugging of completed code is much larger than normal.
3. The part that is easy to estimate, i.e., coding, is given only one-sixth of the schedule.
-Fred Brooks, The Mythical Man-Month.

So much is written about the struggle of software application development management since then, I will not. But I can speak personally to the struggle Fred Brooks describes in 1975. Reading Mythical Man-Month today is like reading part of my autobiography, mistakes and lessons I have made and learned after almost 30 years implementing new systems, processes, and software. For the sake of this article, I will end the story of this struggle by pointing out that agile frameworks, for example the Scaled Agile Framework, SAFE, is the best answer to Fred Brooks formula for a successful project. Again, so much is written on this, I will not.

Let us think about Fred Brooks’s points about design, communication, and testing. Besides engineers and programmers, who designs, communicates, and tests? Per agile frameworks the stakeholders, business owners, product managers/owners, and clients all play crucial roles in the sofware development life cycle. If we look at an agile framework such as SAFE, can we readily see where the coding happens? No. Why? Because SAFE creates a framework in which design, communication, and testing are prioritized over coding and building. (Engineers and programmers, I know this hurts but it shouldn’t. It is all about supporting you, per Fred Brooks and so many others.-GM)

Graph Thinking to Graph Database Application Development

40 years of RDBMS and SQL and NoSQL

We have all seen technology advancements happen and change our world. But consider, how much has database thinking and technology really changed, really, in 40 years? Relational Database Management Systems are the standard for most applications. Structured Query Language is the only ANSI and ISO standard for a database language. RDBMS have gotten bigger as hardware and algorithms enable scaling. We adapted to different data formats because of Big Data with NoSQL databases. As we look more closely at NoSQL variants, aren’t they really just variants of tables, fit to model different shapes of rows and columns, or using tables to index documents? For 40 years, rows and columns, tabular, general ledger shapes have molded, and limited, how we think about data. For 40 years, our applications have been built upon row and column data models, our analyses the same.

New Technology Wave of Graph Database Management Systems

There is a new wave of database technology that is hitting, and it is as significant, or more so, than RDBMs and SQL. It will change how operational and analytical databases are stored, accessed, and analyzed. It will change how we build software applications of all types and sizes. Just as data tables and SQL are found everywhere today, this database technology will be found everywhere in the near future. This new wave of database technology is Graph Database Management Systems, GDBMS.The new ANSI and ISO language is currently called GQL, Graph Query Language. To succeed in the future we must learn a new way of thinking about our data and applications. The new way of thinking about our data and applications is graph thinking.

Today, if you google graph thinking, you don’t get much at all related to how to think about data and applications as graphs. Often there is confusion between graphs and charts. Almost all the web content found on graph databases and graph analytics is geared toward engineers and data scientists. As business owners, stakeholders, product managers, scrum masters and data analysts do you know about graph thinking, graph databases, and graph analysis? Do you know how to collaboratively look at your domain and create a graph data model? Can we talk together with engineers about graph applications? How can we develop the next generation of software applications for our internal and external clients without understanding the advantages, power, and realm of graph theory, graph thinking, graph analysis, and graph databases? How do we design, communicate, and test in terms of graph data?

The most important function that software builders do for their clients is the iterative extraction and refinement of the product requirements. For the truth is, the clients do not know what they want. They usually do not know what questions must be answered, and they almost never have thought of the problem in the detail that must be specified. -Fred Brooks, The Mythical Man-Month

Our entire agile team: stakeholder, business owner, product manager, product owner, scrum master, analyst, and engineer must learn graph thinking and how to apply it to their domain. Then, our agile teams at any scale will be able to use graph thinking to create new, innovative, valuable, insightful applications. We will be able to ask new questions and get answers to the most complex problems we have. We will create analytical reports and dashboards using graph algorithms and visualizations. By bringing Graph Thinking right into a software development framework like SAFE, we can accelerate the adoption of graph database application development and graph data analysis.

Transition to Graph Thinking

How do we get the entire agile team to a new paradigm? How can we get the the entire agile team graph thinking and delivering cutting edge graph database applications? Graph Thinking Workshops for Agile Development is how. Each workshop is designed for specific agile roles, as defined by SAFE, each agile team member plays. The workshops are not limited to SAFE and agile development. SAFE gives us a proven framework for defining roles. So if you identify with any of these roles these workshops are for you. If you just want to know more about graph thinking and how to become a graph thinker these workshops are for you. If you want your organization to dominate the next generation of digital services, these workshops are for you and your team. If you want to accelerate the wave and ride it, these workshops are for you.

Graph Thinking Workshops for Agile Development Teams

Graph Thinking for Product Management

Graph Thinking for Business Owners and Stakeholders

Graph Thinking for Scrum Masters

Graph Thinking for Architects and Engineers

Graph Thinking for Data Analysts

Food-Trak Import Exception Reports

Last year my post about using Pentaho Data Integration to extract Micros 9700 sales data for importing into Food-Trak discussed exception reporting. Well a year has passed and I am pleased to report that I have made a large improvemet to the way we handle exception reporting. (And also, the Micros 3700 extracts are I wrote about in August are working perfectly.) The last couple of months I have been mentoring two very talented individuals in order for them to take over all my Micros 9700 and Food-Trak operational responsibilities so I can focus more on Business Intelligence and Dashboarding using the Pentaho Business Intelligence Suite.

Our company’s restaurants must keep their menus fresh and relevant. This means that quarterly major revisions are done and monthly new items are added and old ones removed. We have utilized the menu engineering capabilities of Food-Trak for two of our biggest restaurants to help us do this. On the data extract side of things, it is very challenging to keep up with all of these changes. It is relatively quick to add new menu item to the Micros 9700 and Micros 3700 systems. The time consuming part is entering a plate recipe into Food-Trak and completing the sales record, especially when there are 15 new items in one restaurant, 10 new items in another, and another adds a few drink specials!

The way I was handling the exception reports before was by copying and pasting the audit report created by Food-Trak. Then, I would save the text file and run it through a Pentaho transformation to create an Excel Spreadsheet, which included the menu item name from Micros. This was still labor intensive, but it was easier than searching through Micros EMC for each menu item number and name.

This month, my two proteges handled entering all the new menu changes that rolled out for 4th Quarter. There was a major over haul of the menu in our largest restaurant, and several changes in the others. I was able to spend my month creating dashboards, cubes, transformations and jobs with the Penthao Business Intelligence Suite.  One project that was completed this month was automating the Food-Trak exception reports. As well, I created a current check analysis dashboard for all the restaurants and prototyped a KPI dashboard for our hotel division.

The Food-Trak exception reports are created using Pentaho Data Integration. Instead of copying and pasting the audit report and filtering out the ‘not founds’, I query the POS Audit table in each Food-Trak database (we use three) for the ‘not founds’. Then, I simply join the results to the Micros 9700 and Micros 3700 menu item masters to extract the menu item names and output the results to an Excel file and a text file.

Food-Trak requires a specific format for the POS code. To make it easer to enter the code into the Food-Trak sales record, I create the POS code formatted to the Food-Trak specifications. Then, our Food-Trak admins just have to copy and paste the code into the sales record, saving a little time and effort for each entry.

Using Pentaho Data Integration, I then created a job which runs the transformation and emails the files to the Food-Trak admins, their manager, our F&B Director, and to myself. I also send a results email to myself so I know if the jobs and transformations worked. Thus today, we automatically upload hundreds of sales records each morning for every restaurant(there are currently six in all and a few more in the works), and autocreate the exception reports which are delivered to the appropriate parties before the work day starts.

Today I created a Micros 3700 to Food-Trak transformation with Pentaho Data Integration. The SQL was the easy part. The hard part was creating the JDBC connection. At least it was hard for me since I haven’t connected to Sybase since 1998. But after 4 hours of googling and experimenting, I figured out the connection configuration and created the transformation.

I used Jconnect and modified the spool.bat file to include its path in the classpath variable. I have a problem with column names being passed through the stream. But, I created a work around and presto, changeoh … a file to import into Food-Trak.

Tomorrow I will test and validate the extract. Once I am 100% sure it works, I will write another post about it.

The Micros 9700 data extract for Food-Trak integration is a step in the Micros 9700 ETL process.  The core transformation extracts Micros 9700 transaction data from MCRSPOS database.  The individual sales transaction is identified in the totals table. The Food-Trak format required for the Micros 9700 interface must include the revenue center, the menu item master number, the menu item definition sequence number, and the menu item  price number. The transformation merges these dimensions with the transaction record from MCRSPOS.Totals.

Pentaho Data Integration transformation

Pentaho Data Integration transformation

The entire extraction and transformation is written with Pentaho Data Integration. Pentaho Data Integration is an open source Extraction, Transformation, and Loading application. Using the graphical user interface of Pentaho Data Integration allows rapid application development. It is relatively easy to customize the extract using simple SQL statements in the transformation table input objects.

A required step for  Micros 9700 and Food-Trak data integration is the creation of product records for each product sold. Each product record is attached to either a recipe or a purchased item in Food-Trak. The Micros 9700 Food-Trak interface requires the revenue center number, the menu item master number, the menu item definition sequence number, and the menu item price number. Creating product records is a very time consuming process and requires access to Micros 9700 EMC and a deep understand of the products sold.

My experience with my company is that our restaurants actually have several thousand products.  Remember from my last article that one of our requirements is fine granularity. Therefore, we can have several product records for one item. For example, a food menu item might have a regular price, a half off student price, and a late night menu price. A draft beer will have a double draft regular price, a double draft happy hour price, a double draft Monday Ladies’ night price, a double draft half off price, a pint regular price, a pint happy hour price, ….

We also have very dynamic menus. Each restaurant changes their menu at least seasonally. Seasonal dishes are offered with various themes or featuring fresh local food. New wines are offered monthly, new beers and liquors appear weekly. Our business strategy requires us to change our menu mixes often, while, of course, maintaining guest favorites. Our business strategy also requires us to manage our food and beverage assets at a very granular level. The Micros 9700 and Food-Trak interface is in a near constant state of flux. Thus, menu changes to Micros and Food-Trak product record updates are all handled by Corporate IT.

Especially at the beginning of the Micros 9700 Food-Trak data integration project, exception reporting is very useful. The Food-Trak Micros 9700 interface creates a results report. Exceptions are flagged <Not Found>. By copying and pasting this report into a text file, it can be merged with Micros data via Pentaho Data Integration. The transformation below reads in the results report and attaches the menu item name to record as well as appending a quantity column.

Food-Trak Micros 9700 data integration exception report

The Micros 9700 transformation is run automatically using Pentaho Data Integration Kettle application. The automated transformation saves a date stamp variable which is inserted into the SQL code of the table input object. A batch containing a one line kettle command is scheduled via Windows Task Scheduler.

My company owns several food and beverage services- restaurants, banquent and catering, concessions, and cafeterias. In order to control costs and manage assets, we implemented the Food-Trak Food and Beverage Management System. For my manufacturing friends, Food-Trak is the equivalent of Enterprise Resource Planning for the Food and Beverage Industry.  And while we think of Food-Trak this way, consider the thought that a restaurant is a manufacturing facility which manufactures product on demand and in real time.

Our goal as a company is to understand and manage our food and beverage assets to a very granular level. Thus, we (attempt) to count every fraction of an ounce of what is served. To this end, we must record every serving in our Point of Sales system(POS), Micros 9700 HMS, and feed these transactions into our Food and Beverage Management System, Food-Trak.

In operations the POS workstation must be programmed to capture the servings.  The challenges we faced were in the realm of modifiers- options, sides, condiments, add-ons, extras. Because of the complexity in programming modifiers and condiments and because of a few years of decentralized programming, many things were neglected. Operations had gotten used to messaging things, using Open transactions, and making assumptions. After a couple months of reconfiguring and testing, we were able to program the Micros 9700 system to capture transactions to the level required without hampering operations. In fact, the changes improved operations as servers had more automated choices presented in a logical way and the kitchen had to do less interpretation of orders, i.e better communications between front of house and the kitchen.

The bigger challenge was extracting the sales data from the Micros 9700 HMS. Organizations that want to integrate their Micros 9700 sales data into Food-Trak have 3 options-

  1. Contract with Micros to do it
  2. Contract with their Micros vendor to do it.
  3. Do it yourself.

Contracting with Micros or a vendor requires significant investment. If you are reading this because you want to integrate Micros 9700 sales data with Food-Trak data, and you have talked to your vendor, you know this is true. Also, Micros doesn’t provide the best customer support in the world. If you are a Micros customer, and not IHG, you know this is true.  Also, the interfaces I was able to review were nothing more than scripts written in SIMM, a proprietary Micros language. Ingenious? Yes. Practical and user friendly? Not at all.

Since I have been writing data extracts since the early 1990s, the decision to do it ourselves was a very easy decision for me to make, although it was met with a bit of skepticism from some of the decision makers. My first attempt was to extract data from the Netvupoint database. Easy right? Micros already has a built in data warehouse. All I had to do was extract the total sales from the Netvupoint tables. Well, during my validation testing, I discovered I couldn’t get to the granularity we required from Netvupoint. I had to dive into the transaction database.

I will save the details of my adventure within the transaction database for another time. When you hire me to setup your extracts, I will share the story with you over coffee! In a few weeks of focused effort, I was able to extract the transactions at the granular level we required and today have a hit rate of >98%. The exceptions are mostly open items and the rest are a data entry errors. I use an open source ETL called Pentaho Data Integration. The extract procedure is automated and has a graphical user interface. To aid in validation testing and monitoring of extracts, I have also created an ETL procedure that reads the exception reports from Food-Trak and matches the “Not Found” menu item number with its name entry in the menu item master table in Micros.

Because of our integration of Micros 9700 sales data and our Food-Trak production data, we are able to create industry standard sales analysis reports on demand and to an incredible depth. I would enjoy talking to you about this experience and look forward to helping you integrate your Micros 9700 sales data with Food-Trak.

Indeed, the car dashboard analogy is used extensively in the business performance dashboard business. In my quest to create meaningful, valuable dashboards, I must always keep an open mind. While I thought the business tachometer is a good idea, and maybe it is ok, I must take heed of Dr. Nicholas Bissantz’s critique of the car metaphor. http://blog.bissantz.com/tachometer

Dr. Bissantz’s work is extraordinary. I will be adding sparklines to my list of graphs to work on. http://www.bissantz.com/sparklines/index.asp

Because the car dashboard metaphor is still ubiquitous in the business performance dashboard world, dials and guages will for the time being will remain as a dashboard component widget just as pie charts and bar charts are still used. However, killer business performance dashboards must adopt sparklines. 

Why do things break down? Are there predictable patterns in malfunctions that can be expressed in terms of mercury retrograde? My personal experience says yes. So do the headlines.

http://www.astrologyweekly.com/astrology-articles/mercury-retrograde.php

http://www.itsecurity.com/features/cable-cut-conspiracy-020708/

http://www.cnn.com/2008/TECH/02/08/internet.outage/

http://ca.reuters.com/article/technologyNews/idCAN1114968920080211

http://abcnews.go.com/Business/story?id=3052158