Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.
Black Box Tester in Python
API Appliance for Extreme Agility and Simplicity
Python is one of the most popular and versatile programming languages in the world. Known for its simplicity and readability, Python is an excellent choice for beginners and experienced developers alike. In this tutorial, we'll cover the fundamentals of Python programming, from basic syntax to more advanced concepts, to help you kickstart your journey into the world of programming. Introduction to Python Python is a high-level, interpreted programming language that emphasizes code readability and simplicity. It was created by Guido van Rossum and first released in 1991. Python's design philosophy focuses on code readability, with its clear and expressive syntax that makes it easy to learn and use. Setting Up Your Environment Before diving into Python programming, you'll need to set up your development environment. Python is compatible with various operating systems, including Windows, macOS, and Linux. You can download and install Python from the official Python website, which provides installers for different platforms. Once Python is installed, you can use the built-in IDLE (Integrated Development and Learning Environment) or choose from a variety of third-party code editors and IDEs (Integrated Development Environments) such as Visual Studio Code, PyCharm, or Sublime Text. Basic Syntax and Data Types Python uses a simple and intuitive syntax, making it easy to write and understand code. Let's start with some basic concepts: Variables and Data Types In Python, variables are used to store data values. You can assign a value to a variable using the equals sign (=). Python supports various data types, including: Integer: Whole numbers without any decimal point (e.g., 10, -5) Float: Numbers with a decimal point (e.g., 3.14, -0.5) String: Sequence of characters enclosed in single ('') or double ("") quotes (e.g., 'hello', "world") Python # Variable assignment x = 10 y = 3.14 name = 'Python' # Print variable values print(x) # Output: 10 print(y) # Output: 3.14 print(name) # Output: Python Basic Arithmetic Operations Python supports various arithmetic operations, including addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (**). Python # Arithmetic operations a = 10 b = 5 print(a + b) # Output: 15 print(a - b) # Output: 5 print(a * b) # Output: 50 print(a / b) # Output: 2.0 print(a ** b) # Output: 100000 Control Flow Statements Control flow statements allow you to control the execution of your code based on certain conditions. Python supports several control flow statements, including: If...Else Statements The if...else statement allows you to execute a block of code conditionally based on a specified condition. Python # If...else statement age = 18 if age >= 18: print("You are eligible to vote.") else: print("You are not eligible to vote.") Loops Loops are used to iterate over a sequence of elements or execute a block of code repeatedly. Python # For loop for i in range(5): print(i) # Output: 0, 1, 2, 3, 4 # While loop num = 0 while num < 5: print(num) # Output: 0, 1, 2, 3, 4 num += 1 Functions and Modules Functions allow you to organize your code into reusable blocks, while modules are Python files containing functions, classes, and variables. You can import modules into your Python code to reuse their functionality, as shown below: Python # Function definition def greet(name): print("Hello, " + name + "!") # Function call greet("Python") # Output: Hello, Python! Conclusion In this Python tutorial for beginners, we've covered the basics of Python programming, including setting up your environment, basic syntax and data types, control flow statements, functions, and modules. Python's simplicity and versatility make it an excellent choice for both beginners and experienced developers. Now that you've learned the fundamentals of Python, you're ready to explore more advanced topics and start building your own Python projects. Happy coding!
Documentation abounds for any topic or questions you might have, but when you try to apply something to your own uses, it suddenly becomes hard to find what you need. This problem doesn't only exist for you. In this blog post, we will look at how LangChain implements RAG so that you can apply the same principles to any application with LangChain and an LLM. What Is RAG? This term is used a lot in today's technical landscape, but what does it actually mean? Here are a few definitions from various sources: "Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response." — Amazon Web Services "Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources." — NVIDIA "Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM's internal representation of information." — IBM Research In this blog post, we'll be focusing on how to write the retrieval query that supplements or grounds the LLM's answer. We will use Python with LangChain, a framework used to write generative AI applications that interact with LLMs. The Data Set First, let's take a quick look at our data set. We'll be working with the SEC (Securities and Exchange Commission) filings from the EDGAR (Electronic Data Gathering, Analysis, and Retrieval system) database. The SEC filings are a treasure trove of information, containing financial statements, disclosures, and other important information about publicly-traded companies. The data contains companies who have filed financial forms (10k, 13, etc.) with the SEC. Different managers own stock in these companies, and the companies are part of different industries. In the financial forms themselves, various people are mentioned in the text, and we have broken the text down into smaller chunks for the vector search queries to handle. We have taken each chunk of text in a form and created a vector embedding that is also stored on the CHUNK node. When we run a vector search query, we will compare the vector of the query to the vector of the CHUNK nodes to find the most similar text. Let's see how to construct our query! Retrieval Query Examples I used a few sources to help me understand how to write a retrieval query in LangChain. The first was a blog post by Tomaz Bratanic, who wrote a post on how to work with the Neo4j vector index in LangChain using Wikipedia article data. The second was a query from the GenAI Stack, which is a collection of demo applications built with Docker and utilizes the StackOverflow data set containing technical questions and answers. Both queries are included below. Python # Tomaz's blog post retrieval query retrieval_query = """ OPTIONAL MATCH (node)<-[:EDITED_BY]-(p) WITH node, score, collect(p) AS editors RETURN node.info AS text, score, node {.*, vector: Null, info: Null, editors: editors} AS metadata """ # GenAI Stack retrieval query retrieval_query=""" WITH node AS question, score AS similarity CALL { with question MATCH (question)<-[:ANSWERS]-(answer) WITH answer ORDER BY answer.is_accepted DESC, answer.score DESC WITH collect(answer)[..2] as answers RETURN reduce(str='', answer IN answers | str + '\n### Answer (Accepted: '+ answer.is_accepted + ' Score: ' + answer.score+ '): '+ answer.body + '\n') as answerTexts } RETURN '##Question: ' + question.title + '\n' + question.body + '\n' + answerTexts AS text, similarity as score, {source: question.link} AS metadata ORDER BY similarity ASC // so that best answers are the last """ Now, notice that these queries do not look complete. We wouldn't start a Cypher query with an OPTIONAL MATCH or WITH clause. This is because the retrieval query is added on to the end of the vector search query. Tomaz's post shows us the implementation of the vector search query. Python read_query = ( "CALL db.index.vector.queryNodes($index, $k, $embedding) " "YIELD node, score " ) + retrieval_query So LangChain first calls the db.index.vector.queryNodes() procedure (more info in documentation) to find the most similar nodes and passes (YIELD) the similar node and the similarity score, and then it adds the retrieval query to the end of the vector search query to pull additional context. This is very helpful to know, especially as we construct the retrieval query, and for when we start testing results! The second thing to note is that both queries return the same three variables: text, score, and metadata. This is what LangChain expects, so you will get errors if those are not returned. The text variable contains the related text, the score is the similarity score for the chunk against the search text, and the metadata can contain any additional information that we want for context. Constructing the Retrieval Query Let's build our retrieval query! We know the similarity search query will return the node and score variables, so we can pass those into our retrieval query to pull connected data of those similar nodes. We also have to return the text, score, and metadata variables. Python retrieval_query = """ WITH node AS doc, score as similarity # some more query here RETURN <something> as text, similarity as score, {<something>: <something>} AS metadata """ Ok, there's our skeleton. Now what do we want in the middle? We know our data model will pull CHUNK nodes in the similarity search (those will be the node AS doc values in our WITH clause above). Chunks of text don't give a lot of context, so we want to pull in the Form, Person, Company, Manager, and Industry nodes that are connected to the CHUNK nodes. We also include a sequence of text chunks on the NEXT relationship, so we can pull the next and previous chunks of text around a similar one. We also will pull all the chunks with their similarity scores, and we want to narrow that down a bit...maybe just the top 5 most similar chunks. Python retrieval_query = """ WITH node AS doc, score as similarity ORDER BY similarity DESC LIMIT 5 CALL { WITH doc OPTIONAL MATCH (prevDoc:Chunk)-[:NEXT]->(doc) OPTIONAL MATCH (doc)-[:NEXT]->(nextDoc:Chunk) RETURN prevDoc, doc AS result, nextDoc } # some more query here RETURN coalesce(prevDoc.text,'') + coalesce(document.text,'') + coalesce(nextDoc.text,'') as text, similarity as score, {<something>: <something>} AS metadata """ Now we keep the 5 most similar chunks, then pull the previous and next chunks of text in the CALL {} subquery. We also change the RETURN to concatenate the text of the previous, current, and next chunks all into text variable. The coalesce() function is used to handle null values, so if there is no previous or next chunk, it will just return an empty string. Let's add a bit more context to pull in the other related entities in the graph. Python retrieval_query = """ WITH node AS doc, score as similarity ORDER BY similarity DESC LIMIT 5 CALL { WITH doc OPTIONAL MATCH (prevDoc:Chunk)-[:NEXT]->(doc) OPTIONAL MATCH (doc)-[:NEXT]->(nextDoc:Chunk) RETURN prevDoc, doc AS result, nextDoc } WITH result, prevDoc, nextDoc, similarity CALL { WITH result OPTIONAL MATCH (result)-[:PART_OF]->(:Form)<-[:FILED]-(company:Company), (company)<-[:OWNS_STOCK_IN]-(manager:Manager) WITH result, company.name as companyName, apoc.text.join(collect(manager.managerName),';') as managers WHERE companyName IS NOT NULL OR managers > "" WITH result, companyName, managers ORDER BY result.score DESC RETURN result as document, result.score as popularity, companyName, managers } RETURN coalesce(prevDoc.text,'') + coalesce(document.text,'') + coalesce(nextDoc.text,'') as text, similarity as score, {documentId: coalesce(document.chunkId,''), company: coalesce(companyName,''), managers: coalesce(managers,''), source: document.source} AS metadata """ The second CALL {} subquery pulls in any related Form, Company, and Manager nodes (if they exist, OPTIONAL MATCH). We collect the managers into a list and ensure the company name and manager list are not null or empty. We then order the results by a score (doesn't currently provide value but could track how many times the doc has been retrieved). Since only the text, score, and metadata properties get returned, we will need to map these extra values (documentId, company, and managers) in the metadata dictionary field. This means updating the final RETURN statement to include those. Wrapping Up! In this post, we looked at what RAG is and how retrieval queries work in LangChain. We also looked at a few examples of Cypher retrieval queries for Neo4j and constructed our own. We used the SEC filings data set for our query and saw how to pull extra context and return it mapped to the three properties LangChain expects. If you are building or interested in more Generative AI content, check out the resources linked below. Happy coding! Resources Demo application: Demo project that uses this retrieval query GitHub repository: Code for demo app that includes retrieval query Documentation: LangChain for Neo4j vector store Free online courses: Graphacademy: LLMs + Neo4j
In today's digital world, where repetitive jobs can be both time-consuming and tedious, automation shines as an icon of effectiveness. Python, with its simplicity and mighty toolboxes, is the ideal instrument to power routine web activities, such as filling forms, extracting information, and more. This guide will reveal to you how to use Python for web automation, covering preparation, basic scripts, and a detailed demonstration of form automation with Selenium. Set up a Virtual Environment Before diving into creating automation scripts, be sure to install Python on your device. The latest Python 3 versions are suggested for their improved functions and assistance. You can obtain Python directly from the official site. After setting up, it's a wise idea to make a virtual workspace for your projects to effectively manage what they need. This helps things run smoothly. A virtual environment is a self-contained folder that holds the programs and packages for your project, keeping all your work and needed software organized separately from other projects. This helps protect your project from unintended changes, as each one has its dedicated copies of Python and any libraries or tools required. Nothing can accidentally interact or interfere between projects in their own virtual environment. Open terminal or command prompt: Access your terminal (Linux/macOS) or command prompt (Windows). Create a virtual environment: Navigate to your project directory and run python -m venv myautomationenv to create a virtual environment named myautomationenv. Activate virtual environment: Before installing packages, activate the environment with: Windows: myautomationenv\Scripts\activate macOS/Linux: source myautomationenv/bin/activate Python # Install virtualenv if you haven't pip install virtualenv # Create a virtual environment virtualenv myautomationenv # Activate the virtual environment # On Windows myautomationenv\Scripts\activate # On MacOS/Linux source myautomationenv/bin/activate Introduction to Selenium for Web Automation Selenium is a very useful tool for automatically controlling web browsers from code. It lets programming instructions perform common browser behaviors, such as clicking buttons, completing forms, and extracting website information. To utilize Selenium with Python, you must: Install Selenium: Run pip install selenium in your terminal. Download WebDriver: Based on your browser (e.g., Chrome, Firefox), download the corresponding WebDriver and note its path. Automating a Simple Login Form Let's start with a basic example of automating a login form: Python from selenium import webdriver import time driver_path = 'path/to/your/webdriver' driver = webdriver.Chrome(driver_path) driver.get('https://example.com/login') username_field = driver.find_element_by_name('username') password_field = driver.find_element_by_name('password') username_field.send_keys('your_username') password_field.send_keys('your_password') submit_button = driver.find_element_by_id('submit') submit_button.click() time.sleep(5) driver.quit() Detailed Form Automation Example Consider a form with fields for name, age, sex, and address. We'll extend our automation to fill out this detailed form: Python from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.ui import Select import time driver_path = 'path/to/your/webdriver' driver = webdriver.Chrome(driver_path) driver.get('https://example.com/form') name_field = driver.find_element_by_name('name') age_field = driver.find_element_by_name('age') sex_select = Select(driver.find_element_by_name('sex')) address_field = driver.find_element_by_name('address') name_field.send_keys('John Doe') age_field.send_keys('30') sex_select.select_by_visible_text('Male') address_field.send_keys('123 Main St, Anytown, USA') submit_button = driver.find_element_by_id('submit') submit_button.click() time.sleep(5) driver.quit() Tips for Effective Automation With Python Start small: Begin with straightforward tasks and progress to more complex scripts. Error handling: Use try-except blocks to manage errors gracefully. Logging: Implement logging to track your script's actions, aiding in debugging. Scheduling: Automate scripts to run at specific times using cron jobs or Task Scheduler. Conclusion Automating tasks with Python and Selenium can help accomplish a lot online. Whether completing forms or retrieving information, these tools allow working smarter instead of harder. As one learns more about Python's automation powers, new opportunities arise every day to apply them to routine jobs. This frees up energy for more difficult problems. It is important to automate judiciously and follow all website rules.
Transport Layer Security (TLS) and Secure Sockets Layer (SSL) are cryptographic protocols that ensure secure data communication over a network. Different versions of these protocols exist, with some having known vulnerabilities, making it critical to verify the TLS/SSL version in use. Below, we will explore how to check the TLS and SSL versions of applications in JavaScript, Python, and other programming languages. Checking TLS/SSL Version in JavaScript In JavaScript, the version of TLS or SSL in use depends on the browser or server the script communicates with. Therefore, we can't explicitly check the SSL/TLS version in JavaScript code. However, you can use online tools like SSL Labs' SSL Test to check the SSL/TLS version supported by your server. Checking TLS/SSL Version in Python In Python, you can use the built-in SSL module to check the SSL/TLS version. Here's a simple script to connect to a server and print the SSL/TLS version: Python import socket import ssl hostname = 'www.example.com' context = ssl.create_default_context() with socket.create_connection((hostname, 443)) as sock: with context.wrap_socket(sock, server_hostname=hostname) as ssock: print(ssock.version()) This script creates a secure connection to the specified hostname and prints the SSL/TLS version of the connection. Checking TLS/SSL Version in Java In Java, the version of TLS or SSL used can be determined through the SSLContext class. You can check the default SSL/TLS protocol version used by your Java application with the following code: Java import javax.net.ssl.SSLContext; public class Main { public static void main(String[] args) throws Exception { SSLContext context = SSLContext.getDefault(); System.out.println("Default SSL/TLS protocol: " + context.getProtocol()); } } This code will print the default SSL/TLS protocol used by SSLContext. Checking TLS/SSL Version in C# In C#, you can check the TLS/SSL version by using the System.Net.Security.SslStream class. Here's an example: C++ using System; using System.Net.Security; using System.Net.Sockets; class Program { static void Main() { TcpClient client = new TcpClient("www.example.com", 443); SslStream sslStream = new SslStream(client.GetStream()); sslStream.AuthenticateAsClient("www.example.com"); Console.WriteLine("SSL Protocol: " + sslStream.SslProtocol); } } This program creates a secure TCP connection to the specified hostname, and then prints the SSL/TLS version of the connection. Checking TLS/SSL Version at the Operating System Level Depending on the operating system in use, different methods are available to check the TLS and SSL versions. On Linux and Unix-Based Systems Use the openssl command-line tool. The following command will connect to a server and return the SSL/TLS version and cipher: Shell openssl s_client -connect www.example.com:443 In this command, replace www.example.com with the hostname of the server you want to check. On Windows Use the 'Test-NetConnection' cmdlet in PowerShell. This cmdlet doesn't directly provide the SSL/TLS version, but it can be used to test if a server supports a particular version: PowerShell Test-NetConnection -ComputerName www.example.com -Port 443 -Tls12 This command tests if the server at 'www.example.com' supports TLS 1.2. You can change '-Tls12' to '-Tls11', '-Tls', or '-Ssl3' to test for those versions. Checking TLS/SSL Version in Docker To check the TLS/SSL version inside a Docker container, you can use the 'openssl' tool, similar to a Linux system. First, you need to connect to the Docker container: Shell docker exec -it [container-id] /bin/bash Replace [container-id] with the ID of your running Docker container. This command will open a bash shell inside the Docker container. An Example of Checking NodeJS Docker Container Running in Kubernetes Then, you can run the 'openssl' command. Shell openssl s_client -connect www.example.com:443 If 'openssl' is not installed in your Docker container, you can install it using the package manager for the Linux distribution your Docker container is based on. For example, if your container is based on an Ubuntu image, you can use apt-get: Shell apt-get update apt-get install openssl Once openssl is installed, you can use it to check the SSL/TLS version as described above. Conclusion Understanding the SSL/TLS version that your application uses is crucial for maintaining secure communication. Depending on the programming language, the SSL/TLS version can be determined either directly within the code or indirectly using online tools. It's essential to stay updated with the latest SSL/TLS versions to ensure your application's security. Please note that it is always recommended to use the most recent version of TLS for security reasons. Older versions such as SSL 2.0, SSL 3.0, and even TLS 1.0 are considered insecure and should not be used.
Upgrading to major versions of Python (e.g., 3.12) can be non-trivial; here's a good article for reference. I recently upgraded API Logic Server, and offer this information in hopes it can make things a bit easier for you. Aside: API Logic Server is open source. It creates executable API/Admin App projects from a database with 1 command. Customize with rules and Python in your IDE. There were 2 areas that required attention: Packaging: Preparing a project for pip installaccess This issue was unique to Python 3.12: the old setup procedures have been removed. Dependent libraries: This is a consideration for any new release. In general, I found this page helpful. My project is database-oriented (using SQLAlchemy), so key risk areas usually involve database access. MySQL and Oracle are generally straightforward, but I always need to address Postgres (psycopg) and SQL/Server (pyodbc). These affect requirements.txt and product packaging. Let's consider packaging first. Project Packaging My project requires packaging for PyPi. This has changed in Python 3.12. Let's go over some quick background. To make a package available for pip install, you must upload it to PyPi. Here's an uploaded example. This is 2 step process as follows: Build local install files: This gathers your dependent libraries, CLI entry points, and so forth. Upload to PyPi: This is unchanged: python3 -m twine upload --skip-existing dist/*. The first step has changed in two ways: How you run the setup process How you specify your dependent libraries Run Setup (Dependencies) This process prepares for python3 -m twine upload..., by creating local files that identify the libraries you require, CLI entry points, and so forth. In the past, you ran python3 setup.py sdist bdist_wheel. That is no longer supported. It has been replaced by: python3 -m build pyproject.toml (Not setup.py) In the past, your setup.py file identified the libraries you require, CLI entry points, and so forth. setup.py is no longer supported in Python 3.12. Instead, you must provide a pyproject.toml file, as described in this guide. The python3 -m build uses this file. For me, this set off a mild panic: I was unable to find a setup-to-toml migration utility, except for those looking to replace the entire pip install workflow. As it turned out, migrating setup.py was not so painful by hand; mainly a series of copy/paste procedures as shown below. Here's a working pyproject.toml shown in the diagram below. psycopg2-binary: Postgres This is used by SQLAlchemy for Postgres access. In addition to pyproject.toml, I had to change requirements.txt, as shown here. I changed psycopg2-binary==2.9.5 to: psycopg2-binary==2.9.9 My project is large, so I found it convenient to create a small venv, and test the install. It took a few tries to straighten out the -binary bit. odbc: SQL/Server Microsoft SQL/Server requires 3 packages (this on a Mac): unixodbc Install unixobdbc. You might get: ==> Running `brew cleanup unixodbc`... Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP. Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`). Removing: /opt/homebrew/Cellar/unixodbc/2.3.11... (48 files, 2.3MB) Warning: The following dependents of upgraded formulae are outdated but will not be upgraded because they are not bottled: msodbcsql18 (venv) val@Vals-MPB-14 Desktop % It seemed to work. odbc driver I required the Microsoft odbc driver. pyodbc This is used by SQLAlchemy. In requirements.txt and pyproject.toml, I had to change pyodbc==4.0.34 --> pyodbc==5.0.0. Minor Issues: Escape Characters As noted in Python docs, mistakes in strings (e.g., \but I forgot the n) were previously not flagged; now they are. I mention this because unexpected messages show up when you start your program under the debugger.
In the rapidly evolving landscape of artificial intelligence, text generation models have emerged as a cornerstone, revolutionizing how we interact with machine learning technologies. Among these models, GPT-4 stands out, showcasing an unprecedented ability to understand and generate human-like text. This article delves into the basics of text generation using GPT-4, providing Python code examples to guide beginners in creating their own AI-driven text generation applications. Understanding GPT-4 GPT-4, or Generative Pre-trained Transformer 4, represents the latest advancement in OpenAI's series of text generation models. It builds on the success of its predecessors by offering more depth and a nuanced understanding of context, making it capable of producing text that closely mimics human writing in various styles and formats. At its core, GPT-4 operates on the principles of deep learning, utilizing a transformer architecture. This architecture enables the model to pay attention to different parts of the input text differently, allowing it to grasp the nuances of language and generate coherent, contextually relevant responses. Getting Started With GPT-4 and Python To experiment with GPT-4, one needs access to OpenAI's API, which provides a straightforward way to utilize the model without the need to train it from scratch. The following Python code snippet demonstrates how to use the OpenAI API to generate text with GPT-4: Python from openai import OpenAI # Set OpenAI API key client = OpenAI(api_key = 'you_api_key_goes_here') #Get your key at https://platform.openai.com/api-keys response = client.chat.completions.create( model="gpt-4-0125-preview", # The Latest GPT-4 model. Trained with data till end of 2023 messages =[{'role':'user', 'content':"Write a short story about a robot saving earth from Aliens."}], max_tokens=250, # Response text length. temperature=0.6, # Ranges from 0 to 2, lower values ==> Determinism, Higher Values ==> Randomness top_p=1, # Ranges 0 to 1. Controls the pool of tokens. Lower ==> Narrower selection of words frequency_penalty=0, # used to discourage the model from repeating the same words or phrases too frequently within the generated text presence_penalty=0) # used to encourage the model to include a diverse range of tokens in the generated text. print(response.choices[0].message.content) In this example, we use the client.chat.completions.create function to generate text. The model parameter specifies which version of the model to use, with "gpt-4-0125-preview" representing the latest GPT-4 preview that is trained with the data available up to Dec 2023. The messages parameter feeds the initial text to the model, serving as the basis for the generated content. Other parameters like max_tokens, temperature, and top_p allow us to control the length and creativity of the output. Applications and Implications The applications of GPT-4 extend far beyond simple text generation. Industries ranging from entertainment to customer service find value in its ability to create compelling narratives, generate informative content, and even converse with users in a natural manner. However, as we integrate these models more deeply into our digital experiences, ethical considerations come to the forefront. Issues such as bias, misinformation, and the potential for misuse necessitate a thoughtful approach to deployment and regulation. Conclusion GPT-4's capabilities represent a significant leap forward in the field of artificial intelligence, offering tools that can understand and generate human-like text with remarkable accuracy. The Python example provided herein serves as a starting point for exploring the vast potential of text generation models. As we continue to push the boundaries of what AI can achieve, it remains crucial to navigate the ethical landscape with care, ensuring that these technologies augment human creativity and knowledge rather than detract from it. In summary, GPT-4 not only showcases the power of modern AI but also invites us to reimagine the future of human-computer interaction. With each advancement, we step closer to a world where machines understand not just the words we say but the meaning and emotion behind them, unlocking new possibilities for creativity, efficiency, and understanding.
Function pipelines allow seamless execution of multiple functions in a sequential manner, where the output of one function serves as the input to the next. This approach helps in breaking down complex tasks into smaller, more manageable steps, making code more modular, readable, and maintainable. Function pipelines are commonly used in functional programming paradigms to transform data through a series of operations. They promote a clean and functional style of coding, emphasizing the composition of functions to achieve desired outcomes. In this article, we will explore the fundamentals of function pipelines in Python, including how to create and use them effectively. We'll discuss techniques for defining pipelines, composing functions, and applying pipelines to real-world scenarios. Creating Function Pipelines in Python In this segment, we'll explore two instances of function pipelines. In the initial example, we'll define three functions—'add', 'multiply', and 'subtract'—each designed to execute a fundamental arithmetic operation as implied by its name. Python def add(x, y): return x + y def multiply(x, y): return x * y def subtract(x, y): return x - y Next, create a pipeline function that takes any number of functions as arguments and returns a new function. This new function applies each function in the pipeline to the input data sequentially. Python # Pipeline takes multiple functions as argument and returns an inner function def pipeline(*funcs): def inner(data): result = data # Iterate thru every function for func in funcs: result = func(result) return result return inner Let’s understand the pipeline function. The pipeline function takes any number of functions (*funcs) as arguments and returns a new function (inner). The inner function accepts a single argument (data) representing the input data to be processed by the function pipeline. Inside the inner function, a loop iterates over each function in the funcs list. For each function func in the funcs list, the inner function applies func to the result variable, which initially holds the input data. The result of each function call becomes the new value of result. After all functions in the pipeline have been applied to the input data, the inner function returns the final result. Next, we create a function called ‘calculation_pipeline’ that passes the ‘add’, ‘multiply’ and ‘substract’ to the pipeline function. Python # Create function pipeline calculation_pipeline = pipeline( lambda x: add(x, 5), lambda x: multiply(x, 2), lambda x: subtract(x, 10) ) Then we can test the function pipeline by passing an input value through the pipeline. Python result = calculation_pipeline(10) print(result) # Output: 20 We can visualize the concept of a function pipeline through a simple diagram. Another example: Python def validate(text): if text is None or not text.strip(): print("String is null or empty") else: return text def remove_special_chars(text): for char in "!@#$%^&*()_+{}[]|\":;'<>?,./": text = text.replace(char, "") return text def capitalize_string(text): return text.upper() # Pipeline takes multiple functions as argument and returns an inner function def pipeline(*funcs): def inner(data): result = data # Iterate thru every function for func in funcs: result = func(result) return result return inner # Create function pipeline str_pipeline = pipeline( lambda x : validate(x), lambda x: remove_special_chars(x), lambda x: capitalize_string(x) ) Testing the pipeline by passing the correct input: Python # Test the function pipeline result = str_pipeline("Test@!!!%#Abcd") print(result) # TESTABCD In case of an empty or null string: Python result = str_pipeline("") print(result) # Error In the example, we've established a pipeline that begins by validating the input to ensure it's not empty. If the input passes this validation, it proceeds to the 'remove_special_chars' function, followed by the 'Capitalize' function. Benefits of Creating Function Pipelines Function pipelines encourage modular code design by breaking down complex tasks into smaller, composable functions. Each function in the pipeline focuses on a specific operation, making it easier to understand and modify the code. By chaining together functions in a sequential manner, function pipelines promote clean and readable code, making it easier for other developers to understand the logic and intent behind the data processing workflow. Function pipelines are flexible and adaptable, allowing developers to easily modify or extend existing pipelines to accommodate changing requirements.
Google BigQuery is a powerful cloud-based data warehousing solution that enables users to analyze massive datasets quickly and efficiently. In Python, BigQuery DataFrames provide a Pythonic interface for interacting with BigQuery, allowing developers to leverage familiar tools and syntax for data querying and manipulation. In this comprehensive developer guide, we'll explore the usage of BigQuery DataFrames, their advantages, disadvantages, and potential performance issues. Introduction To BigQuery DataFrames BigQuery DataFrames serve as a bridge between Google BigQuery and Python, allowing seamless integration of BigQuery datasets into Python workflows. With BigQuery DataFrames, developers can use familiar libraries like Pandas to query, analyze, and manipulate BigQuery data. This Pythonic approach simplifies the development process and enhances productivity for data-driven applications. Advantages of BigQuery DataFrames Pythonic Interface: BigQuery DataFrames provide a Pythonic interface for interacting with BigQuery, enabling developers to use familiar Python syntax and libraries. Integration With Pandas: Being compatible with Pandas, BigQuery DataFrames allow developers to leverage the rich functionality of Pandas for data manipulation. Seamless Query Execution: BigQuery DataFrames handle the execution of SQL queries behind the scenes, abstracting away the complexities of query execution. Scalability: Leveraging the power of Google Cloud Platform, BigQuery DataFrames offer scalability to handle large datasets efficiently. Disadvantages of BigQuery DataFrames Limited Functionality: BigQuery DataFrames may lack certain advanced features and functionalities available in native BigQuery SQL. Data Transfer Costs: Transferring data between BigQuery and Python environments may incur data transfer costs, especially for large datasets. API Limitations: While BigQuery DataFrames provide a convenient interface, they may have limitations compared to directly using the BigQuery API for complex operations. Prerequisites Google Cloud Platform (GCP) Account: Ensure an active GCP account with BigQuery access. Python Environment: Set up a Python environment with the required libraries (pandas, pandas_gbq, and google-cloud-bigquery). Project Configuration: Configure your GCP project and authenticate your Python environment with the necessary credentials. Using BigQuery DataFrames Install Required Libraries Install the necessary libraries using pip: Python pip install pandas pandas-gbq google-cloud-bigquery Authenticate GCP Credentials Authenticate your GCP credentials to enable interaction with BigQuery: Python from google.auth import load_credentials # Load GCP credentials credentials, _ = load_credentials() Querying BigQuery DataFrames Use pandas_gbq to execute SQL queries and retrieve results as a DataFrame: Python import pandas_gbq # SQL Query query = "SELECT * FROM `your_project_id.your_dataset_id.your_table_id`" # Execute Query and Retrieve DataFrame df = pandas_gbq.read_gbq(query, project_id="your_project_id", credentials=credentials) Writing to BigQuery Write a DataFrame to a BigQuery table using pandas_gbq: Python # Write DataFrame to BigQuery pandas_gbq.to_gbq(df, destination_table="your_project_id.your_dataset_id.your_new_table", project_id="your_project_id", if_exists="replace", credentials=credentials) Advanced Features SQL Parameters Pass parameters to your SQL queries dynamically: Python params = {"param_name": "param_value"} query = "SELECT * FROM `your_project_id.your_dataset_id.your_table_id` WHERE column_name = @param_name" df = pandas_gbq.read_gbq(query, project_id="your_project_id", credentials=credentials, dialect="standard", parameters=params) Schema Customization Customize the DataFrame schema during the write operation: Python schema = [{"name": "column_name", "type": "INTEGER"}, {"name": "another_column", "type": "STRING"}] pandas_gbq.to_gbq(df, destination_table="your_project_id.your_dataset_id.your_custom_table", project_id="your_project_id", if_exists="replace", credentials=credentials, table_schema=schema) Performance Considerations Data Volume: Performance may degrade with large datasets, especially when processing and transferring data between BigQuery and Python environments. Query Complexity: Complex SQL queries may lead to longer execution times, impacting overall performance. Network Latency: Network latency between the Python environment and BigQuery servers can affect query execution time, especially for remote connections. Best Practices for Performance Optimization Use Query Filters: Apply filters to SQL queries to reduce the amount of data transferred between BigQuery and Python. Optimize SQL Queries: Write efficient SQL queries to minimize query execution time and reduce resource consumption. Cache Query Results: Cache query results in BigQuery to avoid re-executing queries for repeated requests. Conclusion BigQuery DataFrames offer a convenient and Pythonic way to interact with Google BigQuery, providing developers with flexibility and ease of use. While they offer several advantages, developers should be aware of potential limitations and performance considerations. By following best practices and optimizing query execution, developers can harness the full potential of BigQuery DataFrames for data analysis and manipulation in Python.
A sign of a good understanding of a programming language is not whether one is simply knowledgeable about the language’s functionality, but why such functionality exists. Without knowing this “why," the developer runs the risk of using functionality in situations where its use might not be ideal - or even should be avoided in its entirety! The case in point for this article is the lateinit keyword in Kotlin. Its presence in the programming language is more or less a way to resolve what would otherwise be contradictory goals for Kotlin: Maintain compatibility with existing Java code and make it easy to transcribe from Java to Kotlin. If Kotlin were too dissimilar to Java - and if the interaction between Kotlin and Java code bases were too much of a hassle - then adoption of the language might have never taken off. Prevent developers from declaring class members without explicitly declaring their value, either directly or via constructors. In Java, doing so will assign a default value, and this leaves non-primitives - which are assigned a null value - at the risk of provoking a NullPointerException if they are accessed without a value being provided beforehand. The problem here is this: what happens when it’s impossible to declare a class field’s value immediately? Take, for example, the extension model in the JUnit 5 testing framework. Extensions are a tool for creating reusable code that conducts setup and cleanup actions before and after the execution of each or all tests. Below is an example of an extension whose purpose is to clear out all designated database tables after the execution of each test via a Spring bean that serves as the database interface: Java public class DBExtension implements BeforeAllCallback, AfterEachCallback { private NamedParameterJdbcOperations jdbcOperations; @Override public void beforeAll(ExtensionContext extensionContext) { jdbcOperations = SpringExtension.getApplicationContext(extensionContext) .getBean(NamedParameterJdbcTemplate.class); clearDB(); } @Override public void afterEach(ExtensionContext extensionContext) throws Exception { clearDB(); } private void clearDB() { Stream.of("table_one", "table_two", "table_three").forEach((tableName) -> jdbcOperations.update("TRUNCATE " + tableName, new MapSqlParameterSource()) ); } } (NOTE: Yes, using the @Transactional annotation is possible for tests using Spring Boot tests that conduct database transactions, but some use cases make automated transaction roll-backs impossible; for example, when a separate thread is spawned to execute the code for the database interactions.) Given that the field jdbcOperations relies on the Spring framework loading the proper database interface bean when the application is loaded, it cannot be assigned any substantial value upon declaration. Thus, it receives an implicit default value of null until the beforeAll() function is executed. As described above, this approach is forbidden in Kotlin, so the developer has three options: Declare jdbcOperations as var, assign a garbage value to it in its declaration, then assign the “real” value to the field in beforeAll(): Kotlin class DBExtension : BeforeAllCallback, AfterEachCallback { private var jdbcOperations: NamedParameterJdbcOperations = StubJdbcOperations() override fun beforeAll(extensionContext: ExtensionContext) { jdbcOperations = SpringExtension.getApplicationContext(extensionContext) .getBean(NamedParameterJdbcOperations::class.java) clearDB() } override fun afterEach(extensionContext: ExtensionContext) { clearDB() } private fun clearDB() { listOf("table_one", "table_two", "table_three").forEach { tableName: String -> jdbcOperations.update("TRUNCATE $tableName", MapSqlParameterSource()) } } } The downside here is that there’s no check for whether the field has been assigned the “real” value, running the risk of invalid behavior when the field is accessed if the “real” value hasn’t been assigned for whatever reason. 2. Declare jdbcOperations as nullable and assign null to the field, after which the field will be assigned its “real” value in beforeAll(): Kotlin class DBExtension : BeforeAllCallback, AfterEachCallback { private var jdbcOperations: NamedParameterJdbcOperations? = null override fun beforeAll(extensionContext: ExtensionContext) { jdbcOperations = SpringExtension.getApplicationContext(extensionContext) .getBean(NamedParameterJdbcOperations::class.java) clearDB() } override fun afterEach(extensionContext: ExtensionContext) { clearDB() } private fun clearDB() { listOf("table_one", "table_two", "table_three").forEach { tableName: String -> jdbcOperations!!.update("TRUNCATE $tableName", MapSqlParameterSource()) } } } The downside here is that declaring the field as nullable is permanent; there’s no mechanism to declare a type as nullable “only” until its value has been assigned elsewhere. Thus, this approach forces the developer to force the non-nullable conversion whenever accessing the field, in this case using the double-bang (i.e. !!) operator to access the field’s update() function. 3. Utilize the lateinit keyword to postpone a value assignment to jdbcOperations until the execution of the beforeAll() function: Kotlin class DBExtension : BeforeAllCallback, AfterEachCallback { private lateinit var jdbcOperations: NamedParameterJdbcOperations override fun beforeAll(extensionContext: ExtensionContext) { jdbcOperations = SpringExtension.getApplicationContext(extensionContext) .getBean(NamedParameterJdbcOperations::class.java) clearDB() } override fun afterEach(extensionContext: ExtensionContext) { clearDB() } private fun clearDB() { listOf("table_one", "table_two", "table_three").forEach { tableName: String -> jdbcOperations.update("TRUNCATE $tableName", MapSqlParameterSource()) } } } No more worrying about silently invalid behavior or being forced to “de-nullify” the field each time it’s being accessed! The “catch” is that there’s still no compile-time mechanism for determining whether the field has been accessed before it’s been assigned a value - it’s done at run-time, as can be seen when decompiling the clearDB() function: Java private final void clearDB() { Iterable $this$forEach$iv = (Iterable)CollectionsKt.listOf(new String[]{"table_one", "table_two", "table_three"}); int $i$f$forEach = false; NamedParameterJdbcOperations var10000; String tableName; for(Iterator var3 = $this$forEach$iv.iterator(); var3.hasNext(); var10000.update("TRUNCATE " + tableName, (SqlParameterSource)(new MapSqlParameterSource()))) { Object element$iv = var3.next(); tableName = (String)element$iv; int var6 = false; var10000 = this.jdbcOperations; if (var10000 == null) { Intrinsics.throwUninitializedPropertyAccessException("jdbcOperations"); } } } Not ideal, considering what’s arguably Kotlin’s star feature (compile-time checking of variable nullability to reduce the likelihood of the “Billion-Dollar Mistake”) - but again, it’s a “least-worst” compromise to bridge the gap between Kotlin code and the Java-based code that provides no alternatives that adhere to Kotlin’s design philosophy. Use Wisely! Aside from the above-mentioned issue of conducting null checks only at run-time instead of compile-time, lateinit possesses a few more drawbacks: A field that uses lateinit cannot be an immutable val, as its value is being assigned at some point after the field’s declaration, so the field is exposed to the risk of inadvertently being modified at some point by an unwitting developer and causing logic errors. Because the field is not instantiated upon declaration, any other fields that rely on this field - be it via some function call to the field or passing it in as an argument to a constructor - cannot be instantiated upon declaration as well. This makes lateinit a bit of a “viral” feature: using it on field A forces other fields that rely on field A to use lateinit as well. Given that this mutability of lateinit fields goes against another one of Kotlin’s guiding principles - make fields and variables immutable where possible (for example, function arguments are completely immutable) to avoid logic errors by mutating a field/variable that shouldn’t have been changed - its use should be restricted to where no alternatives exist. Unfortunately, several code patterns that are prevalent in Spring Boot and Mockito - and likely elsewhere, but that’s outside the scope of this article - were built on Java’s tendency to permit uninstantiated field declarations. This is where the ease of transcribing Java code to Kotlin code becomes a double-edged sword: it’s easy to simply move the Java code over to a Kotlin file, slap the lateinit keyword on a field that hasn’t been directly instantiated in the Java code, and call it a day. Take, for instance, a test class that: Auto-wires a bean that’s been registered in the Spring Boot component ecosystem Injects a configuration value that’s been loaded in the Spring Boot environment Mocks a field’s value and then passes said mock into another field’s object Creates an argument captor for validating arguments that are passed to specified functions during the execution of one or more test cases Instantiates a mocked version of a bean that has been registered in the Spring Boot component ecosystem and passes it to a field in the test class Here is the code for all of these points put together: Kotlin @SpringBootTest @ExtendWith(MockitoExtension::class) @AutoConfigureMockMvc class FooTest { @Autowired private lateinit var mockMvc: MockMvc @Value("\${foo.value}") private lateinit var fooValue: String @Mock private lateinit var otherFooRepo: OtherFooRepo @InjectMocks private lateinit var otherFooService: OtherFooService @Captor private lateinit var timestampCaptor: ArgumentCaptor<Long> @MockBean private lateinit var fooRepo: FooRepo // Tests below } A better world is possible! Here are ways to avoid each of these constructs so that one can write “good” idiomatic Kotlin code while still retaining the use of auto wiring, object mocking, and argument capturing in the tests. Becoming “Punctual” Note: The code in these examples uses Java 17, Kotlin 1.9.21, Spring Boot 3.2.0, and Mockito 5.7.0. @Autowired/@Value Both of these constructs originate in the historic practice of having Spring Boot inject the values for the fields in question after their containing class has been initialized. This practice has since been deprecated in favor of declaring the values that are to be injected into the fields as arguments for the class’s constructor. For example, this code follows the old practice: Kotlin @Service class FooService { @Autowired private lateinit var fooRepo: FooRepo @Value("\${foo.value}") private lateinit var fooValue: String } It can be updated to this code: Kotlin @Service class FooService( private val fooRepo: FooRepo, @Value("\${foo.value}") private val fooValue: String, ) { } Note that aside from being able to use the val keyword, the @Autowired annotation can be removed from the declaration of fooRepo as well, as the Spring Boot injection mechanism is smart enough to recognize that fooRepo refers to a bean that can be instantiated and passed in automatically. Omitting the @Autowired annotation isn’t possible for testing code: test files aren't actually a part of the Spring Boot component ecosystem, and thus, won’t know by default that they need to rely on the auto-wired resource injection system - but otherwise, the pattern is the same: Kotlin @SpringBootTest @ExtendWith(MockitoExtension::class) @AutoConfigureMockMvc class FooTest( @Autowired private val mockMvc: MockMvc, @Value("\${foo.value}") private val fooValue: String, ) { @Mock private lateinit var otherFooRepo: OtherFooRepo @InjectMocks private lateinit var otherFooService: OtherFooService @Captor private lateinit var timestampCaptor: ArgumentCaptor<Long> @MockBean private lateinit var fooRepo: FooRepo // Tests below } @Mock/@InjectMocks The Mockito extension for JUnit allows a developer to declare a mock object and leave the actual mock instantiation and resetting of the mock’s behavior - as well as injecting these mocks into the dependent objects like otherFooService in the example code - to the code within MockitoExtension. Aside from the disadvantages mentioned above about being forced to use mutable objects, it poses quite a bit of “magic” around the lifecycle of the mocked objects that can be easily avoided by directly instantiating and manipulating the behavior of said objects: Kotlin @SpringBootTest @ExtendWith(MockitoExtension::class) @AutoConfigureMockMvc class FooTest( @Autowired private val mockMvc: MockMvc, @Value("\${foo.value}") private val fooValue: String, ) { private val otherFooRepo: OtherFooRepo = mock() private val otherFooService = OtherFooService(otherFooRepo) @Captor private lateinit var timestampCaptor: ArgumentCaptor<Long> @MockBean private lateinit var fooRepo: FooRepo @AfterEach fun afterEach() { reset(otherFooRepo) } // Tests below } As can be seen above, a post-execution hook is now necessary to clean up the mocked object otherFooRepo after the test execution(s), but this drawback is more than made up for by otherfooRepo and otherFooService now being immutable as well as having complete control over both objects’ lifetimes. @Captor Just as with the @Mock annotation, it’s possible to remove the @Captor annotation from the argument captor and declare its value directly in the code: Kotlin @SpringBootTest @AutoConfigureMockMvc class FooTest( @Autowired private val mockMvc: MockMvc, @Value("\${foo.value}") private val fooValue: String, ) { private val otherFooRepo: OtherFooRepo = mock() private val otherFooService = OtherFooService(otherFooRepo) private val timestampCaptor: ArgumentCaptor<Long> = ArgumentCaptor.captor() @MockBean private lateinit var fooRepo: FooRepo @AfterEach fun afterEach() { reset(otherFooRepo) } // Tests below } While there’s a downside in that there’s no mechanism in resetting the argument captor after each test (meaning that a call to getAllValues() would return artifacts from other test cases’ executions), there’s the case to be made that an argument captor could be instantiated as an object within only the test cases where it is to be used and done away with using an argument captor as a test class’s field. In any case, now that both @Mock and @Captor have been removed, it’s possible to remove the Mockito extension as well. @MockBean A caveat here: the use of mock beans in Spring Boot tests could be considered a code smell, signaling that, among other possible issues, the IO layer of the application isn’t being properly controlled for integration tests, that the test is de-facto a unit test and should be rewritten as such, etc. Furthermore, too much usage of mocked beans in different arrangements can cause test execution times to spike. Nonetheless, if it’s absolutely necessary to use mocked beans in the tests, a solution does exist for converting them into immutable objects. As it turns out, the @MockBean annotation can be used not just on field declarations, but also for class declarations as well. Furthermore, when used at the class level, it’s possible to pass in the classes that are to be declared as mock beans for the test in the value array for the annotation. This results in the mock bean now being eligible to be declared as an @Autowired bean just like any “normal” Spring Boot bean being passed to a test class: Kotlin @SpringBootTest @AutoConfigureMockMvc @MockBean(value = [FooRepo::class]) class FooTest( @Autowired private val mockMvc: MockMvc, @Value("\${foo.value}") private val fooValue: String, @Autowired private val fooRepo: FooRepo, ) { private val otherFooRepo: OtherFooRepo = mock() private val otherFooService = OtherFooService(otherFooRepo) private val timestampCaptor: ArgumentCaptor<Long> = ArgumentCaptor.captor() @AfterEach fun afterEach() { reset(fooRepo, otherFooRepo) } // Tests below } Note that like otherFooRepo, the object will have to be reset in the cleanup hook. Also, there’s no indication that fooRepo is a mocked object as it’s being passed to the constructor of the test class, so writing patterns like declaring all mocked beans in an abstract class and then passing them to specific extending test classes when needed runs the risk of “out of sight, out of mind” in that the knowledge that the bean is mocked is not inherently evident. Furthermore, better alternatives to mocking beans exist (for example, WireMock and Testcontainers) to handle mocking out the behavior of external components. Conclusion Note that each of these techniques is possible for code written in Java as well and provides the very same benefits of immutability and control of the objects’ lifecycles. What makes these recommendations even more pertinent to Kotlin is that they allow the user to align more closely with Kotlin’s design philosophy. Kotlin isn’t simply “Java with better typing:" It’s a programming language that places an emphasis on reducing common programming errors like accidentally accessing null pointers as well as items like inadvertently re-assigning objects and other pitfalls. Going beyond merely looking up the tools that are at one’s disposal in Kotlin to finding out why they’re available in the form that they exist will yield dividends of much higher productivity in the language, less risks of trying to fight against the language instead of focusing on solving the tasks at hand, and, quite possibly, making writing code in the language an even more rewarding and fun experience.
The calculation of the norm of vectors is essential in both artificial intelligence and quantum computing for tasks such as feature scaling, regularization, distance metrics, convergence criteria, representing quantum states, ensuring unitarity of operations, error correction, and designing quantum algorithms and circuits. You will learn how to calculate the Euclidean (norm/distance), also known as the L2 norm, of a single-dimensional (1D) tensor in Python libraries like NumPy, SciPy, Scikit-Learn, TensorFlow, and PyTorch. Understand Norm vs Distance Before we begin, let's understand the difference between Euclidean norm vs Euclidean distance. Norm is the distance/length/size of the vector from the origin (0,0). Distance is the distance/length/size between two vectors. Prerequisites Install Jupyter. Run the code below in a Jupyter Notebook to install the prerequisites. Python # Install the prerequisites for you to run the notebook !pip install numpy !pip install scipy %pip install torch !pip install tensorflow You will use Jupyter Notebook to run the Python code cells to calculate the L2 norm in different Python libraries. Let's Get Started Now that you have Jupyter set up on your machine and installed the required Python libraries, let's get started by defining a 1D tensor using NumPy. NumPy NumPy is a Python library used for scientific computing. NumPy provides a multidimensional array and other derived objects. Tensor ranks Python # Define a single dimensional (1D) tensor import numpy as np vector1 = np.array([3,7]) #np.random.randint(1,5,2) vector2 = np.array([5,2]) #np.random.randint(1,5,2) print("Vector 1:",vector1) print("Vector 2:",vector2) print(f"shape & size of Vector1 & Vector2:", vector1.shape, vector1.size) Print the vectors Plain Text Vector 1: [3 7] Vector 2: [5 2] shape & size of Vector1 & Vector2: (2,) 2 Matplotlib Matplotlib is a Python visualization library for creating static, animated, and interactive visualizations. You will use Matplotlib's quiver to plot the vectors. Python # Draw the vectors using MatplotLib import matplotlib.pyplot as plt %matplotlib inline origin = np.array([0,0]) plt.quiver(*origin, vector1[0],vector1[1], angles='xy', color='r', scale_units='xy', scale=1) plt.quiver(*origin, vector2[0],vector2[1], angles='xy', color='b', scale_units='xy', scale=1) plt.plot([vector1[0],vector2[0]], [vector1[1],vector2[1]], 'go', linestyle="--") plt.title('Vector Representation') plt.xlim([0,10]) plt.ylim([0,10]) plt.grid() plt.show() Vector representation using Matplolib Python # L2 (Euclidean) norm of a vector # NumPy norm1 = np.linalg.norm(vector1, ord=2) print("The magnitude / distance from the origin",norm1) norm2 = np.linalg.norm(vector2, ord=2) print("The magnitude / distance from the origin",norm2) The output once you run this in the Jupyter Notebook: Plain Text The magnitude / distance from the origin 7.615773105863909 The magnitude / distance from the origin 5.385164807134504 SciPy SciPy is built on NumPy and is used for mathematical computations. If you observe, SciPy uses the same linalg functions as NumPy. Python # SciPy import scipy norm_vector1 = scipy.linalg.norm(vector1, ord=2) print("L2 norm in scipy for vector1:", norm_vector1) norm_vector2 = scipy.linalg.norm(vector2, ord=2) print("L2 norm in scipy for vector2:", norm_vector2) Output: Plain Text L2 norm in scipy for vector1: 7.615773105863909 L2 norm in scipy for vector2: 5.385164807134504 Scikit-Learn As the Scikit-learn documentation says: Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities. We reshape the vector as Scikit-learn expects the vector to be 2-dimensional. Python # Sklearn from sklearn.metrics.pairwise import euclidean_distances vector1_reshape = vector1.reshape(1,-1) ## Scikit-learn expects the vector to be 2-Dimensional euclidean_distances(vector1_reshape, [[0, 0]])[0,0] Output Plain Text 7.615773105863909 TensorFlow TensorFlow is an end-to-end machine learning platform. Python # TensorFlow import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1' import tensorflow as tf print("TensorFlow version:", tf.__version__) ## Tensorflow expects Tensor of types float32, float64, complex64, complex128 vector1_tf = vector1.astype(np.float64) tf_norm = tf.norm(vector1_tf, ord=2) print("Euclidean(l2) norm in TensorFlow:",tf_norm.numpy()) Output The output prints the version of TensorFlow and the L2 norm: Plain Text TensorFlow version: 2.15.0 Euclidean(l2) norm in TensorFlow: 7.615773105863909 PyTorch PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. Python # PyTorch import torch print("PyTorch version:", torch.__version__) norm_torch = torch.linalg.norm(torch.from_numpy(vector1_tf), ord=2) norm_torch.item() The output prints the PyTorch version and the norm: Plain Text PyTorch version: 2.1.2 7.615773105863909 Euclidean Distance Euclidean distance is calculated in the same way as a norm, except that you calculate the difference between the vectors before passing the difference - vector_diff, in this case, to the respective libraries. Python # Euclidean distance between the vectors import math vector_diff = vector1 - vector2 # Using norm euclidean_distance = np.linalg.norm(vector_diff, ord=2) print(euclidean_distance) # Using dot product norm_dot = math.sqrt(np.dot(vector_diff.T,vector_diff)) print(norm_dot) Output Output using the norm and dot functions of NumPy libraries: Plain Text 5.385164807134504 5.385164807134504 Python # SciPy from scipy.spatial import distance distance.euclidean(vector1,vector2) Output Using SciPy 5.385164807134504 The Jupyter Notebook with the outputs is available on the GitHub repository. You can run the Jupyter Notebook on Colab following the instructions on the GitHub repository.
Sameer Shukla
Sr. Software Engineer,
Leading Financial Institution
Kai Wähner
Technology Evangelist,
Confluent
Alvin Lee
Founder,
Out of the Box Development, LLC