A few weeks in the past (March 31 — April 4, 2025) I participated in Google’s 5-day lengthy intensive Gen AI course.
It coated quite a lot of subjects from foundational ideas like transformer-based architectures and immediate engineering to deep dive technical subjects, for instance, Google’s scalable approximate nearest neighbor (ScaNN) algorithm. Other than printed whitepapers, they made it much more helpful with reside classes that includes specialists throughout Google’s varied AI groups. As an illustration, I used to be in a position to see the massive image, how ScaNN is being utilized in AlloyDB and why we want vector databases. The technical elements (code labs) primarily targeted on implementing the ideas (what’s on whitepaper) and getting aware of Gemini’s API.
1) Immediate Engineering is a should, you should discover ways to speak to AI.
2) LLMs will not be nearly guessing the following phrase. You will get inventive by NOT guessing the following phrase, for instance, by tuning the configuration parameters like temperature, top-Okay, top-P.
3) Analysis is perhaps an important half, measuring efficiency is simply as essential as constructing the mannequin.
4) There’s nonetheless time however domain-specific functions of AI (like SecLM and MedLM) will change the best way we work.
5) NotebookLM is a incredible device for schooling. Go strive it now, should you haven’t carried out already → https://notebooklm.google.com/
Following the course, we have been inspired to construct a capstone undertaking to exhibit what we had realized. My undertaking, Lintelligence (the title comes from code linting, not L’intelligence), aimed to discover whether or not massive language fashions might successfully analyze code for safety flaws, similar to a safety professional. If profitable, such a mannequin might be built-in right into a CI/CD pipeline to catch vulnerabilities earlier than it goes into manufacturing, in the end decreasing safety dangers.
I experimented with two approaches to realize this purpose.
1) Few-shot prompting, the place I offered Gemini with some susceptible code examples.
2) Retrieval-Augmented Technology (RAG), utilizing a vector database (ChromaDB) to question comparable (vector search) previous instances. Briefly; I used the CVEFixes dataset (80/20 practice/take a look at cut up), constructed embeddings, crafted prompts, parsed Gemini’s output, and evaluated predictions towards recognized protected and susceptible code.
Under you will discover the outcomes of my preliminary (and never profitable) experiment. It’s removed from good however I might nonetheless name it a pleasant first step in my journey of exploring how GenAI can be utilized in safety engineering.
You may try the complete notebook on Kaggle here.
Outcomes
1) Few-shot Prompting Strategy:
Accuracy: 46/92 = 0.50
Invalid: 8 (Response couldn’t be parsed)
Classification Report:
precision recall f1-score assistprotected 0.57 0.57 0.57 53
susceptible 0.41 0.41 0.41 39
accuracy 0.50 92
macro avg 0.49 0.49 0.49 92
weighted avg 0.50 0.50 0.50 92
2) RAG Strategy:
Accuracy: 55/98 = 0.56
Invalid: 2 (Response couldn’t be parsed)
Classification Report:
precision recall f1-score assistprotected 0.58 0.68 0.63 53
susceptible 0.53 0.42 0.47 45
accuracy 0.56 98
macro avg 0.55 0.55 0.55 98
weighted avg 0.56 0.56 0.55 98
Whereas massive language fashions like Gemini can acknowledge sure insecure patterns, there are some limitations:
- For Few-shot Prompting: The effectiveness is extremely delicate to the standard of the examples. On the time of writing, I noticed that I ought to have additionally included “non-vulnerable” code snippets within the instance prompts. Other than the examples, the mannequin struggles with understanding deeper logic if the given code is simply too lengthy.
- For RAG: I anticipated it to outperform easy prompting however the outcomes urged that I used to be not in a position to implement adequate. To enhance the accuracy, I plan to revisit and refine the RAG mechanism and take a look at it with different fashions/datasets.
Under are the course construction and supplies. All credit score goes to the groups at Google and Kaggle. I really recognize the trouble they put into creating such a helpful studying expertise!
Day 1 (Half 1):
“Foundational Giant Language Fashions & Textual content Technology”:
Day 1 (Half 2):
“Immediate Engineering”:
Day 2:
“Embeddings and Vector Shops/ Databases”:
Day 3 (Half 1):
“Generative AI Brokers”:
Day 3 (Half 2):
“Brokers Companion”:
Day 4:
“Area-Particular LLMs”:
Day 5:
“MLOps for Generative AI”:
Dwell streams (Q&A): https://www.youtube.com/playlist?list=PLqFaTIg4myu-lbBTrUpoQQIzZZxvrOaP5