cs

package module
v0.0.0-...-2947a06 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 17, 2025 License: BSD-2-Clause-Views Imports: 38 Imported by: 0

README

cs -- codesearch

This tool is a port of the livegrep web UI onto a backend powered by rsc's codesearch tool.

But why?

Livegrep is great: you get fast (almost realtime) search results over all off your company's codebase using regexes. The interface is simple, dense, and gets out of your way.

Alas, the backend indexing code has gotter rather large, somewhat hard to follow if you're (like me) trying to figure out how to do a few surgical fixes. The build is also somewhat hard to update (if e.g. you want to upgrade Go :)) due to Bazel & various rules being a few versions behind.

CS keeps the livegrep Go+JS front end, but uses it so wrap a modified version of codesearch.

Notable differences with livegrep.
  • Backend is only Go. Easyer to update.
  • Codesearch-based index that stores all of the blobs for fast access.
  • Smaller indexes, leading to faster (initial?) searches due to lower memory requirement.
  • GitHub sync & indexing built into the server process: only one always-on process to manage.
Updates to codesearch.

Storing blobs: Codesearch works by creating an tri-gram index over the inputs. At search time, the index is used to locate the files to search, but then a regex search is run over the files themselves. This proved slower than necessary: when opening the files, syscalls showed up as a bottleneck. Even when using some other interface, e.g. pulling blobs from packed git files, or a sqlite DB, syscalls continued being in the way. CS modifies the index format to store the indexed blobs as well. At search time, they are mmap'd into memory the DFA-based regex run over them.

Return matched range: Codesearch implements grep by using the DFA-based regex to find a match, then finding the line boundaries, and writing out the whole line. It has no need of tracking where in the line the match is. CS updates the regex code to stop at the end of the match (instead of the end of the line), reverses the regex, and runs it backward to find the beginning of the match.

Updates to livegrep's UI.
  • Various tracking integrations have been removed (google analytics, sentry).
  • Code updated to typescript--mostly to enable auto completion and some error detection, but also to strategically annotate some types to help with changes.
  • All libraries updated as much as is relevant.
  • Code highlighting library replaced.
  • All libraries are bundled and served as a blob of JS relevant for the page. No more external CDN necessary. Esbuild is the bundler, run via go generate.

Usage

  • go install sgrankin.dev/cs/cmd/csweb@latest
  • Create a config file (see below).
  • Run using your favorite init system. ~/go/bin/csweb -listen=localhost:8910 -config=config.yaml -rebuild-interval=30m
  • Set the GITHUB_TOKEN environment variable to a valid GitHub token to increase API rate limit & see private repos.
Config
index:  # Multiple indexes are possible if you want to keep things separate.
  path: /var/lib/codesearch  # Path to a directory with permissions.
  reposources:  # Sources of git repos.
    github:  # Only github is implemented.
      - org: "golang"  # All repos from an org.
      - user: "me"  # All repos from a user.
      - repo: "golang/go"  # A single repo.

See config.go for other fields, including how to specify raw repos and all of the livegrep UI options.

Index directory

There are a few things in the index directory. You will need to allocate enough space for:

  • The actual index.
  • A temp copy of the index that's built on reindex.
  • A git repository that all of the remote repos are fetched into during indexing. The fetch is incremental, but the repo is never GC'd due to limits in [go-git]. You may want to run go maintenance on this repo.

Hacking

Building
  • go generate ./... when updating web assets.
  • go run ./cmd/csweb to run locally.
Wishlist (maybe?)
  • Trigger repo updates on webhook push (so that you don't have to wait for a poll reindex).
  • Incremental updates where just the updated repo is rebuilt. Codesearch supports an index merge, so it would be easy to cache an index per repo and then merge them all whenever a component is updated. This was a motivation for replacing the backend... but practice, CS is faster than livegerp indexing, so this no longer seems necessary.
  • Cleanup: simplify the logger used to not double log time, etc.
  • Metrics, maybe, so than you know when something broke.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildIndex

func BuildIndex(path string, repos []Repo) error

BuildIndex creates a new index at `path` with the given `repos`.

func BuildSearchIndex

func BuildSearchIndex(cfg IndexConfig, githubToken string) error

func EmbedFSServer

func EmbedFSServer(root embed.FS) http.Handler

EmbedFSServer creates an HTTP static file server on the OS.

The main distinction from http.FileServerFS is that it calculates and caches hashes for the underlying file system. An embed.FS is required since it can be assumed immutable and fixed, so hashes are cached indefinitely.

func FlagVar

func FlagVar[T any, PT interface {
	flag.Value
	*T
}](name string, value T, usage string) PT

FlagVar is equivalent to flag.Var, but handles allocation of the named type.

func NewSearchIndex

func NewSearchIndex(cfg IndexConfig) *searchIndex

Types

type Bounds

type Bounds struct {
	Left, Right int
}

type CodeSearchResult

type CodeSearchResult struct {
	Stats       SearchStats
	Results     []SearchResult
	FileResults []FileResult
	Facets      []Facet

	// unique index identity that served this request
	IndexName string
	IndexTime int
}

type Config

type Config struct {
	Index IndexConfig `json:"index"`
	Serve ServeConfig `json:"serve"`
}

type EnvString

type EnvString string

EnvString is a flag.Value that expands environment variables when fetched.

func (*EnvString) Get

func (s *EnvString) Get() string

func (*EnvString) Set

func (s *EnvString) Set(val string) error

func (*EnvString) String

func (s *EnvString) String() string

type ExitReason

type ExitReason string
const (
	ExitReasonNone       ExitReason = "NONE"
	ExitReasonTimeout    ExitReason = "TIMEOUT"
	ExitReasonMatchLimit ExitReason = "MATCH_LIMIT"
)

func (ExitReason) String

func (v ExitReason) String() string

type Facet

type Facet struct {
	Key    string
	Values []FacetValue
}

type FacetValue

type FacetValue struct {
	Value string
	Count int
}

type File

type File struct {
	Tree, Version, Path string
}

func (File) Less

func (f File) Less(b File) bool

type FileResult

type FileResult struct {
	File   File
	Bounds Bounds
}

type GitHubSourceConfig

type GitHubSourceConfig struct {
	// One and only one of org, user, or repo must be set.
	Org  string `json:"org"`
	User string `json:"user"`
	Repo string `json:"repo"`

	Ref      string `json:"ref"` // Defaults to HEAD.
	Archived bool   `json:"archived"`
	Forks    bool   `json:"forks"`  // Include archived or forked repos.
	Reject   string `json:"reject"` // Regexp, if not empty, to filter out repos from the index.
}

type IndexConfig

type IndexConfig struct {
	Name string `json:"name"` // The name of this grouping of repos.  Defaults to path basename.
	Path string `json:"path"` // A directory holding all of the indexes and git data.

	Repos       []RepoConfig     `json:"repos"`       // Repositories to fetch.
	RepoSources RepoSourceConfig `json:"reposources"` // Sources that expand into more repos.
}

type IndexInfo

type IndexInfo struct {
	IndexTime time.Time
	Trees     []Tree
}

type LineResult

type LineResult struct {
	LineNumber                  int
	Line                        string
	Bounds                      Bounds
	ContextBefore, ContextAfter []string
}

type Query

type Query struct {
	Line string // Freeform regexp

	// File & Repo inclusion/exclusion regexps:
	File, NotFile                []string
	Repo, NotRepo, Tags, NotTags string

	RepoFilter []string // Additional exact-match repository filter

	FoldCase     bool // Ignore case when searching.
	MaxMatches   int  // Max matches to return.  Must be set.
	FilenameOnly bool // Search for `Line` only in file names.
	ContextLines int  // Results have the N lines before and after matched line.
}

type Repo

type Repo interface {
	// Name is the unique ref name this Repo tracks in the underlying repository.
	Name() string

	// For git, Version yields the SHA1 of the commit.
	Version() Version

	// Files yields all of the files in the tree at the current Version.
	// For git, Refresh may be called concurrently, unless it deletes underlying objects.
	Files(yield func(RepoFile) error) error

	// Refresh fetches the remote and updates the local tracking ref.
	Refresh() (Version, error)
}

type RepoConfig

type RepoConfig struct {
	Name      string // Optional, defaults to base path of URL.  Must be unique.
	RemoteURL string // Remote URL.
	RemoteRef string // Remote reference / branch / etc.
}

func ResolveFetchSpecs

func ResolveFetchSpecs(client *github.Client, specs []GitHubSourceConfig, auth *url.Userinfo) ([]RepoConfig, error)

type RepoFile

type RepoFile interface {
	Path() string
	Size() int
	FileMode() fs.FileMode
	Reader() io.ReadCloser
}

type RepoSourceConfig

type RepoSourceConfig struct {
	GitHub []GitHubSourceConfig `json:"github"`
}

type SearchIndex

type SearchIndex interface {
	// User visible name of the index
	Name() string

	// Info returns the metadata about the index.
	// Background data updates may cause this info to update.
	Info() IndexInfo

	// Paths returns the list of file paths in this index.
	Paths(tree, version, pathPrefix string) []File

	// Data returns the full data for the file at the path.
	// If the path is not found in this index, data returned will be empty.
	Data(tree, version, path string) string

	// Search returns search results.
	// Errors will be returned if the query is invalid.
	// The context may be used to cancel the search.
	Search(ctx context.Context, q Query) (*CodeSearchResult, error)

	// Reload will refresh the index if it has been changed on disk.
	Reload()
}

type SearchResult

type SearchResult struct {
	File  File
	Lines []LineResult
}

type SearchStats

type SearchStats struct {
	TotalTime int64

	ExitReason ExitReason
}

type ServeConfig

type ServeConfig struct {
	DefaultMaxMatches int `json:"default_max_matches"`

	Templates struct {
		// TODO: move page customizations into a sub struct.
		Feedback struct {
			MailTo string `json:"mailto"` // The mailto address for the "feedback" url.
		} `json:"feedback"`

		HeaderHTML template.HTML `json:"header_html"` // HTML injected into layout template for site-specific customizations.
		FooterHTML template.HTML `json:"footer_html"` // HTML injected into layout template just before </body> for site-specific customization.
	} `json:"templates"`
}

type Tree

type Tree struct {
	Name, Version string
}

type Version

type Version = string

Directories

Path Synopsis
cmd
csbuild command
csweb command
codesearch
regexp
Package regexp implements regular expression search tuned for use in grep-like programs.
Package regexp implements regular expression search tuned for use in grep-like programs.
sparse
Package sparse implements sparse sets.
Package sparse implements sparse sets.
livegrep
server/build command
server/gencss command
server/views
templ: version: v0.3.960
templ: version: v0.3.960

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL