Jack Wotherspoon · 2025-01-30

使用 Gemini CLI 进行数据分析

摘要

本课程展示如何使用 Gemini CLI 处理会议数据,从数据清洗、分析到可视化仪表板构建。通过处理超过8000条原始记录,演示了数据管道的完整流程:使用 Python 脚本清理数据、导入 BigQuery 数据库、生成营销分析报告,并创建基于 Flask 的交互式仪表板,展示了 Gemini CLI 在多源数据整合和可视化方面的强大能力。

要点

  • Gemini CLI 支持多种数据类型处理,包括 CSV、Google Docs、PDF、源代码、图像等结构化和非结构化数据
  • 数据处理流程包括:使用 Python 脚本清洗8000+原始记录,分类为热线索、温线索和冷线索,并导入 BigQuery 数据库以便团队协作
  • 通过 /dir add 命令可扩展访问范围,整合多个项目文件(如品牌 Logo、配色方案)到会话状态中
  • 自动化营销分析:生成包含顶级公司、国家分布、职位头衔、会议参与度等洞察的 Google Doc 报告
  • 使用 Flask 构建定制化仪表板,整合品牌视觉元素,提供交互式数据可视化体验

视频信息:Gemini CLI for Data Analysis


中文翻译

你可以使用 Gemini CLI 来分析结构化和非结构化数据,从 CSV 文件到 PDF 再到系统日志。在本节课中,你将清洗、分析和可视化会议的潜在客户扫描数据,以构建一个交互式仪表板。你将看到 Gemini CLI 如何处理整个数据管道。我们开始吧。

Gemini CLI 或实际上任何代理型 CLI 的优势之一,就是它能够处理和读取各种不同数据类型的能力。从 CSV 等结构化数据,到 Google Docs、PDF 等非结构化数据,再到源代码,以及图像、图表等视觉数据,或者是像我们在上一节课中展示的那样搜索网络。

就本节课而言,科技会议现在已经结束了。它取得了巨大的成功,所以恭喜你。但既然会议结束了,你就面临着一个挑战:如何处理你在会议期间收集的所有数据。当与会者穿过会议楼层,或参加会议、参观展位时,他们的徽章会被扫描。每一次徽章扫描都是一个数据点,所以你现在拥有成千上万个数据点。你希望能够深入挖掘这些数据,以便为明年的会议做出可操作的决策。为了实现这一目标,你需要使用 Gemini CLI 摄入所有数据,用它来帮助清理数据、进行分析,然后我们将构建一个非常漂亮的视觉仪表板,以便我们可以轻松地查看结果。

在深入使用 Gemini CLI 进行本节课之前,让我们先看看我们的上下文文件,即我们的 GEMINI.md。我们在这个上下文文件中定制了一些见解,包括什么算作"热门(hot)“线索、“温热(warm)“线索和"冷(cold)“线索,以及关于我们希望如何总结这些内容、我们希望仪表板看起来像什么样子的见解,还有像谁是这个仪表板或我们分析结果的受众这样的见解。

我们可以快速浏览一下我们要处理的数据类型。我们有成千上万条来自会议的数据条目,例如扫描 ID、与会者 ID,以及关于与会者的信息,如他们的姓名、电子邮件,也就是你在会议上扫描徽章时预期会收集到的那些信息。

我们在数据分析文件夹中启动了 Gemini CLI,准备工作已经就绪。我们会告诉 Gemini CLI 看看我们的潜在客户扫描数据。这些数据既存储在本地,也存储在云端的 Google Sheets 中。所以我们告诉它我们有两个数据源:本地文件和 Google Sheets。实际上,我们只是告诉它去处理和清理数据。它会遍历并删除重复项或无效数据,给我们一个漂亮的、最终的干净数据集,供我们的仪表板使用。

Gemini CLI 再次使用 Workspace 扩展从 Google Sheets 检索数据,并使用其内置工具读取我们的本地 CSV 文件。它将创建一个本地 Python 脚本并运行它,以帮助我们处理潜在客户并清理我们的数据集。你会注意到 Gemini CLI 将把我们的 Google Sheets 保存为本地文件,以便它能够以统一且简洁的方式处理所有内容。

它将尝试运行 Python 脚本。看起来它试图使用 pip,但作为一个搞 Python 的人,我更喜欢 UV,所以我实际上会告诉 Gemini CLI 使用 UV 工具而不是 pip。这样可以获得一个更好、更干净的虚拟环境。

你可以看到它现在已经成功处理了所有的潜在客户。在我们的潜在客户扫描数据中有超过 8,000 条原始记录,其中超过 5,000 条被归类为唯一潜在客户。你可以看到 Priority_Segment(优先级细分)的分类,大约有 1,800 个热门线索,1,900 个温热线索,以及 1,400 个冷线索。

现在我们已经处理了这些线索,我们想把它们保存到数据库中,以便日后访问,或者让团队成员看到我们看到的相同数据。因为目前它只是存储在本地 CSV 文件中。所以我们将告诉 Gemini CLI 实际上将数据导入到 Google Cloud 上的 BigQuery 数据库中。

Gemini CLI 实际上会利用我本地机器上的工具。我已经安装了 gcloud CLI 以及 BigQuery 的 bq CLI。只需几个快速命令,它就能将所有数据上传到 BigQuery。现在它正在使用 BigQuery CLI 来创建数据集。我们可以通过查询 BigQuery 表中有多少温热线索来快速验证这一点。我们可以看到,我们得到了与之前本地分析相同的结果,即有 1948 个温热线索。

现在让我们更进一步,要求 Gemini CLI 对数据进行深度分析,真正找到趋势和重要的细节,这也许是我们的营销团队希望在未来的营销计划中使用的。所以这次 Gemini CLI 将创建另一个 Python 脚本,但这次它将更侧重于营销。你可以看到这些数据现在包含了诸如与会者来自的顶级公司、顶级国家、不同的职位头衔、人们认为哪些会议很吸引人,以及一些战略和营销建议。

它将其保存为我们机器上的本地 Markdown 文件,但我们要把它变成 Google Doc,以便我们可以快速与其他团队成员分享。如果我们去查看我们的 Google Drive,我们可以看到 Gemini CLI 已经成功将其创建为 Google Doc,其中包含诸如供任何人快速查看细节的执行摘要、我们将线索分类为热、温、冷的标准,以及快速的受众洞察和人口统计数据等内容。

这太棒了。人们会非常喜欢这个,但我们可以更进一步,让 Gemini CLI 利用所有这些数据创建一个漂亮的、视觉上吸引人的仪表板,让它变得更好。因为我们希望仪表板使用我们的 Logo 以及我们添加到网站上的品牌指南中的配色方案。默认情况下,Gemini CLI 只能访问给定项目中的文件。我们可以利用 Gemini CLI 中的 directory add 命令或 /dir add 命令将不同的项目添加到会话状态中,以便 Gemini CLI 可以访问像我们的 Logo SVG 文件这样的文件,并查看我们网站项目中的字体颜色。

所以现在我们可以告诉 Gemini CLI 使用我们的品牌颜色和会议 Logo 来构建仪表板。它将搭建项目框架并添加像 Flask 这样的依赖项,以使这个仪表板成为现实。它将创建 app.py Flask 应用程序以及包含我们要品牌颜色和 Logo 的 HTML 文件。

看起来 Gemini CLI 尝试启动应用程序,但该端口在我们的本地机器上已经被占用了。所以我们将直接提示它使用不同的端口并尝试再次启动应用程序。好了,成功了。看起来 Gemini CLI 已经在 5001 端口上成功启动了 Flask 仪表板。

如果我们前往应用程序所在的 localhost 5001 端口,我们可以看到我们的仪表板已经启动并运行,左上角使用了 Tech Stack 会议的 Logo,并且也使用了我们网站的配色方案。你可以看到它拥有与我们在 Google Doc 分析报告中相同类型的数据,但这次是以漂亮的仪表板形式可视化的,我们可以点击不同的项目。我们可以看到顶级公司、最近的热门线索的细分数据。

这只是你使用 Gemini CLI 来处理和帮助你应对复杂数据的一种方式,但这之外还有大量的不同用例,我希望你能进一步探索。

English Script

You can use Gemini CLI to analyze both structured and unstructured data, from CSVs to PDFs to system logs. In this lesson, you’ll clean, analyze, and visualize conference lead scan data to build an interactive dashboard. You’ll see how Gemini CLI handles the entire data pipeline. Here we go.

One of the strengths to Gemini CLI or any agentic CLI really, is the ability for it to be able to handle and read various different data types. From Structured Data like CSVs, to Unstructured Data like Google Docs, PDFs, to Source Code, to Visual Data, Images, Diagrams, or searching the web like we showed in a previous lesson.

For the purpose of this lesson, the tech conference is now over. It was a huge success, so congrats. But now that the conference is over, you have the challenge of figuring out what to do with all the data that you collected during the conference. As attendees walked through the conference floor or attended sessions or booths, their badges were scanned. Each scan of a badge is a data point, so you now have thousands and thousands of data points. You want to be able to hammer down on the data in order to make actionable decisions for next year’s conference. In order to accomplish this, you are going to want to ingest all the data with Gemini CLI, use it to help clean up the data, analyze it, and then we’ll build a really nice visual dashboard so we can easily see the results.

So before we dive into using Gemini CLI for this lesson, let’s first take a look at our context file, our GEMINI.md. We’ve tailored our context file with insights about what equates to a hot lead versus a warm lead versus a cold lead, as well as some insights about how we want this to be summarized, what we want our dashboard to look like, as well as insights like who is the audience for this dashboard or the results of our analysis.

We can take a quick look at the kind of data that we are dealing with. We have thousands of entries of data from the conference, such as Scan ID, Attendee ID, and information about the attendees like their name, their email, stuff that you would expect to be collected when you get your badge scanned at a conference.

So we have Gemini CLI booted up in our data analysis folder, so we are ready to get going. So we’ll tell Gemini CLI to take a look at our lead scan data. We have it stored locally as well as in the cloud in Google Sheets. So we’re telling it that we have two data sources, our local files as well as Google Sheets. And really, we’re just telling it to process and clean the data. It will go through and remove duplicates or remove invalid data and give us a nice final clean dataset that we can use for our dashboards.

Gemini CLI is using Workspace extension again to go ahead and retrieve the data from the Google Sheets, as well as its built-in tools to go and read through our local CSV files. It’s going to go ahead and create a local Python script that it can run to help us process the leads and clean up our data set. You’ll notice that Gemini CLI is going to save our Google Sheets to local files so that it can process everything in a uniform and concise way.

It’s going to go ahead and try to run the Python script. It looks like it’s trying to use pip, but as a Python person, I prefer UV, so I’ll go ahead and actually just tell Gemini CLI to use the UV tool over pip. For a much nicer and clean virtual environment.

You can see that it’s now successfully processed all of the leads. There are over 8,000 raw records in our lead scan data, with over 5,000 of those being what are classified as unique leads. And you can see the Priority_Segment breakdown from having around 1,800 hot leads, 1,900 warm leads, and 1,400 cold leads.

Now that we’ve processed these leads, we want to actually save them to a database so that we can access them at a later date or team members can see the same data that we’re seeing. As right now, it’s just stored in a local CSV file. So we’ll go ahead and tell Gemini CLI to actually import the data to a BigQuery database on Google Cloud.

Gemini CLI will actually utilize the tools on my local machine. I already have the gcloud CLI installed as well as the BigQuery bq CLI installed. With just a few quick commands, it will be able to upload all the data to BigQuery. And now it’s going ahead and using that BigQuery CLI to create the dataset. We can quickly verify this by querying how many warm leads we have in our BigQuery table. We can see that we get the same result that we had before from our local analysis that there are 1948 warm leads.

Let’s now go a step further and ask Gemini CLI to do a deep analysis of the data and really find trends and important details that maybe our marketing team would want to use in a future marketing plan. So this time Gemini CLI will create another Python script, but this time it will cater it more towards marketing. You can see that this data now has things like the top companies where attendees came from, the top countries, different job titles, which sessions people found engaging, as well as some strategic and marketing recommendations.

It’s saved it as a markdown file locally on our machine, but we want to have this as a Google Doc so that we can quickly share it out with other team members. And if we head over and look in our Google Drive, we can see that Gemini CLI has successfully created it into a Google Doc where we have things like an executive summary for anyone quickly taking a look at the details, the criteria for how we classified things as hot, warm, or cold leads, as well as quick audience insights and demographics.

This is awesome. People will really enjoy this, but we can go a step further and make this even better by having Gemini CLI create a nice visually appealing dashboard with all this data. Because we want the dashboard to use our logo as well as the color scheme from our brand guidelines that we added to our website. By default, Gemini CLI only has access to the files in the given project. We can utilize the directory add command or /dir add command within Gemini CLI to go ahead and add a different project into the session state so that Gemini CLI can access files like our logo SVG file as well as see the font colors from our website project.

So now we can go ahead and tell Gemini CLI to build the dashboard using our brand colors and conference logo. It’s going to go and scaffold out the project and add the dependencies like Flask to make this dashboard a reality. It will create the app.py Flask application as well as the HTML files incorporating our brand colors and logo.

It looks like Gemini CLI tried to start up the application, but that port is already being used on our local machine. So we’ll go ahead and just prompt it to use a different port and to try starting up the application again. And there we have it. It looks like Gemini CLI has successfully started up the Flask dashboard on port 5001.

If we head over to localhost on port 5001 where the application is living, we can see our dashboard is up and running using both our logo in the top left corner here for the Tech Stack conference and using the color scheme from our website as well. You can see it has the same kind of data that was in our analysis report in our Google Doc, but this time visualized as a nice dashboard that we can click on different items. We can see the breakdown of the top companies, the recent hot leads.

This is just one way you can use Gemini CLI to wrangle and help you deal with complex data, but there are a vast amount of different use cases out there that I hope you explore further.