SQL Query to Get Column Name & Value Pairs | Understanding Database Structure

pythonsql.png

SELECT * FROM SAMPLE_TABLE FOR JSON PATH



The line of SQL code above is what I have been looking for in the last couple of weeks. I always get excited when I discover solutions for problems. Seeing problems resolved with one or two lines of code is just beyond amazing. If you are a database expert, this is probably nothing for you and you may have used this trick many many times. I am not a database expert and I don't work with databases a lot. My interaction with SQL are limited to smaller databases for smaller apps. The biggest database I have had experience with is HiveSQL. For this reason I haven't had a need to use the FOR JSON PATH before. If I knew I probably would have used it more often.

The one liner code above returns column_name/value pairs in json format. I have written hundreds of query scripts, but never had a need for the query to return column names. Most of the time since the databases I used were very small, it would be obvious which value belonged to what column. Now imagine if you are introduced to a new database with thousands of tables and hundreds of thousands of columns. This may become a challenge to figure out how data is stored and connections between data in different tables. It is one of my summer projects. I am sure this will be an interesting, challenging, and learning experience. Luckily, it is not of high priority and I have three months to work on it.

I use SQL within python. I find using SQL within python scripts more flexible, efficient, and fast. Python is great, and it can work with all kinds of databases. This specific project involves database hosted on a MS SQL server. Usually when I start working with new databases, I have problems setting up initial connection. This time I had zero issues, and machine seemed to be properly connected. Initial connection test below worked without any issues.

import pymssql

def db_connection_test(db):
    try:
        conn = pymssql.connect(server=db[0], user=db[1], password=db[2], database=db[3], port=db[4])
        conn.close()
        return True
    except pymssql.Error as e:
        print(f"Connection error: {e}")
        return False

def main():
    db =["server", "user", "pw", "database", "port"]
    if db_connection_test(db):
        print("Connection successful!")
    else:
        print("Connection failed.")

if __name__ == "__main__":
    main()

I was surprised. Usually, I always run into some errors when setting up initial configuration for connection. What this meant was I indeed had the access to the database. Sure it may have thousand or so tables, and hundreds of thousands of columns. All I had to do was to get the all the tables with their columns and save them in a spreadsheet, and everything would be clear. That too went super smooth. Luckily I already have done this with HiveSQL in the past.

sql_command = '''
SELECT table_name, column_name
FROM INFORMATION_SCHEMA.COLUMNS
ORDER BY table_name
'''

The reason SQL query is assigned to a variable is because that's how I use them within a python script. After more processing of the result of the query I was able to save all tables and columns in one spreadsheet. Now I had the map for the database. All I had to figure out now is the tables that I would be using and columns where needed data is stored. It became evident very quick that I wouldn't be able to go through all the tables and columns one by one to understand what they store. The naming was done well, but there are just too many tables and each has 40-50 columns. Most of the data in these columns I will never use. But a few columns in probably dozens of tables will need to be explored.

I started querying data from each table one by one. I kept getting data I just couldn't understand what was happening. Part of the reason I couldn't understand what the data was because it wasn't clear if the order of data returned and order of columns in my spreadsheet matched. They probably didn't. Even if they did, it would take too long to match them one by one. I knew there was a better way. All I needed was a query that would return column name and its value in pairs. This way I wouldn't have to guess anymore. That is exactly what the one line code at the top of the post does. Now that I have this solution, I already know how exploring this database will be easier.

Column name and value pairs will absolutely make it easier to understand the data. But more importantly I think now I may be able to loop through many tables and identify the columns I needed based on sample data value. Not that I have all the tools needed, I think I am ready to go deeper into the structure of the database. While I am sure it will be an awesome learning experience, I keep second guessing myself if it was a right decision to take on this project. It is not even what I wanted to do in the first place. It is one of those stories when you find a bug or a problem, present a solution, then you ended being tasked to solve the problem.

I normally don't deal with large databases directly, but rather large size documents like pdf, word, excel, etc. With python manipulating data, identifying necessary parts, throwing out unnecessary ones, and building new documents is fun. When working with one of the document types I noticed a problem where long texts would overlap due to lack of space in the structure of the document. While my scripts would work just fine, it would lose the data that may be useful. After this problem was resolved in how the documents were generated, this broke multiple of my scripts. I kinda regretted even mentioning the initial problem, and inadvertently said - "See, if I had access to the database, this wouldn't happen." And access to the database was granted with a project attached.

It is actually great. I am sure there will be many benefits exploring the source of data. I have a full month ahead to get familiar with all the tables and start putting things together. The goal is to gain better understanding of the database structure to build various reports, and potentially add improvements how documents are generated.

This one line SQL query only works with MSSQL. However, there are similar solutions for PostgreSQL, MySQL, and others. Please share your tips and tricks in exploring large databases programatically.



0
0
0.000

5 comments

I think its nice to see the SQL and Python working together. Most of the SQL languages are similar with some differences. So I think that it just depends on what you like to do. I like that Python makes it easier to manage the systems though.

0
0
0.000

Every problem has a solution, but it takes time and hard work. Just like you have just worked hard for a long time and found a solution to your problem, only those people succeed in life who keep working hard in the same way and never give up.

0
0
0.000

good to know how this works and thanks for sharing it.

0
0
0.000

Congratulations @geekgirl! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You got more than 16500 replies.
Your next target is to reach 17000 replies.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out our last posts:

Our Hive Power Delegations to the May PUM Winners
Feedback from the June Hive Power Up Day
Hive Power Up Month Challenge - May 2025 Winners List
0
0
0.000

The combination of SQL and Python makes our work much easier in large database analyses.

0
0
0.000