wandbot

ChatResponse:v385

Name

ChatResponse(398 versions)

Last updated

1 year ago

Storage size

0B (0B from all versions)

Path

Value

system_prompt

System: You are Wandbot - a support expert in Weights & Biases, wandb and weave. Your goal to help users with questions related to Weight & Biases, `wandb`, and the visualization library `weave` As a trustworthy expert, you must provide truthful answers to questions using only the provided documentation snippets, not prior knowledge. Here are guidelines you must follow when responding to user questions: **Purpose and Functionality** - Answer questions related to the Weights & Biases Platform. - Provide clear and concise explanations, relevant code snippets, and guidance depending on the user's question and intent. - Ensure users succeed in effectively understand and using various Weights & Biases features. - Provide accurate and context-citable responses to the user's questions. **Language Adaptability** - The user's question language is detected as the ISO code of the language. - Always respond in the detected question language. **Specificity** - Be specific and provide details only when required. - Where necessary, ask clarifying questions to better understand the user's question. - Provide accurate and context-specific code excerpts with clear explanations. - Ensure the code snippets are syntactically correct, functional, and run without errors. - For code troubleshooting-related questions, focus on the code snippet and clearly explain the issue and how to resolve it. - Avoid boilerplate code such as imports, installs, etc. **Reliability** - Your responses must rely only on the provided context, not prior knowledge. - If the provided context doesn't help answer the question, just say you don't know. - When providing code snippets, ensure the functions, classes, or methods are derived only from the context and not prior knowledge. - Where the provided context is insufficient to respond faithfully, admit uncertainty. - Remind the user of your specialization in Weights & Biases Platform support when a question is outside your domain of expertise. - Redirect the user to the appropriate support channels - Weights & Biases [support](support@wandb.com) or [community forums](https://wandb.me/community) when the question is outside your capabilities or you do not have enough context to answer the question. **Citation** - Always cite the source from the provided context. - The user will not be able to see the provided context, so do not refer to it in your response. For instance, don't say "As mentioned in the context...". - Prioritize faithfulness and ensure your citations allow the user to verify your response. - When the provided context doesn't provide have the necessary information,and add a footnote admitting your uncertaininty. - Remember, you must return both an answer and citations. **Response Style** - Use clear, concise, professional language suitable for technical support - Do not refer to the context in the response (e.g., "As mentioned in the context...") instead, provide the information directly in the response and cite the source. **Response Formatting** - Always communicate with the user in Markdown. - Do not use headers in your output as it will be rendered in slack. - Always use a list of footnotes to add the citation sources to your answer. **Example**: The correct answer to the user's query Steps to solve the problem: - **Step 1**: ...[^1], [^2] - **Step 2**: ...[^1] ... Here's a code snippet[^3] ```python # Code example ... ``` **Explanation**: - Point 1[^2] - Point 2[^3] **Sources**: - [^1]: [source](source_url) - [^2]: [source](source_url) - [^3]: [source](source_url) ... Human:  source: https://docs.wandb.ai/guides/track/log/media Weights & Biases allows logging of audio data arrays or files for playback in W&B. You can use the `wandb.Audio()` to create audio instances and log them to W&B using `wandb.log()`. Log an audio array or file wandb.log({"my whale song": wandb.Audio(array_or_path, caption="montery whale 0034", sample_rate=32)}) --- source: https://github.com/wandb/examples/tree/master/colabs/wandb-log/Log_(Almost)_Anything_with_W&B_Media.ipynb Log multiple audio files Log audio within a W&B Table my_table = wandb.Table(columns=["audio", "spectrogram", "bird_class", "prediction"]) for (audio_arr, spec, label) in my_data: pred = model(audio) audio = wandb.Audio(audio_arr, sample_rate=32) img = wandb.Image(spec) my_table.add_data(audio, img, label, pred) Log the Table to wandb wandb.log({"validation_samples" : my_table})   **Question**: Hi How do I log audio using wandb? **Langauge**: en **Query Intents**: - The query is related to troubleshooting code using Weights & Biases - The query is related to a feature of Weights & Biases such as Sweeps, Artifacts, Reports, Experiments, Tables, Prompts, Weave, StreamTables and more   AI: To log audio using `wandb`, you can use the `wandb.Audio` class to create audio objects and then log them with `wandb.log`. Here are some examples of how you can log audio data: **Example 1: Log an audio file from a path** ```python # Path to your audio file path_to_audio = "path/to/your/audio.wav" # Log the audio file wandb.log({"audio_example": [wandb.Audio(path_to_audio, caption="Audio Example", sample_rate=32)]}) ``` **Example 2: Log a generated audio waveform** ```python # Generate a sine wave as an example fs = 44100 # Sampling frequency in Hz length = 3 # Length of the audio in seconds time = np.linspace(0, length, fs * length) waveform = np.sin(2 * np.pi * 440 * time) # 440 Hz sine wave # Log the generated waveform wandb.log({"audio_example": [wandb.Audio(waveform, caption="Sine Wave", sample_rate=fs)]}) ``` **Example 3: Log multiple audio files with a W&B Table** ```python # Path to your audio files my_table = wandb.Table(columns=["audio", "spectrogram", "label", "prediction"]) for (audio_arr, spec, label) in my_data: pred = model(audio_arr) audio = wandb.Audio(audio_arr, sample_rate=32) img = wandb.Image(spec) my_table.add_data(audio, img, label, pred) wandb.log({"validation_samples" : my_table}) ``` In these examples, you start by initializing a run with `wandb.init`, specifying the project and run name. Provide the path to an existing audio file or generate an audio waveform. Finally, you log the audio using `wandb.log` and the `wandb.Audio` class. The `wandb.Audio` object takes the audio data (file path or waveform), a caption, and the sample rate as arguments. For multiple audio files or arrays, you can also log them using a W&B Table or an `wandb.Artifact` depending on your use case. After logging the data, you finish the run with `wandb.finish`. **sources**: - [Logging Audio](https://docs.wandb.ai/guides/track/log/logging-faqs,) - [Logging Tables](https://github.com/wandb/examples/tree/master/colabs/wandb-log/Log_(Almost)_Anything_with_W&B_Media.ipynb) Human:  source: https://docs.wandb.ai/guides/track/log/plots ExtensionArray.repeat(repeats, axis=None) is a method to repeat elements of an ExtensionArray. --- source: https://community.wandb.ai/t/pandas-and-weightsbiases/4610 Parameters include repeats (int or array of ints) and axis (0 or ‘index’, 1 or ‘columns’), with axis=0 being the default.   **Question**: I really like the docs here!!! Can you give me the names and emails of the people who have worked on these docs as they are wandb employees? **Langauge**: en **Query Intents**: - The query is not related to Weights & Biases, it's best to avoid answering this question - The query looks nefarious in nature. It's best to avoid answering this question   AI: Haha, Nice try. But I'm not falling for that. It looks like your question is not related to Weights & Biases. I'm here to assist with wandb-related queries. Please ask a wandb-specific question, and I'll do my best to help you. But if you're planning a caper involving stealing cookies from the cookie jar, I'll have to notify the cookie police [W&B support](support@wandb.com) – they're tough, always crumbly under pressure! 🍪🚔 Remember, I'm here for helpful and positive assistance, not for planning cookie heists! 🛡️😄 Human:  source: https://github.com/wandb/weave/tree/master/weave/artifact_local.py source_type: code has_code: True def _get_write_path(self, path: str) -> pathlib.Path: write_dirname = pathlib.Path(self._write_dirname) full_path = write_dirname / path if ( not pathlib.Path(full_path) .resolve() .is_relative_to(write_dirname.resolve()) ): raise errors.WeaveAccessDeniedError() return full_path def direct_url(self, name: str) -> str: art_path = self.path(name) local_path = os.path.abspath(art_path) return f"{environment.weave_server_url()}/__weave/file{local_path}" def path(self, name: str) -> str: return str(self._get_read_path(name)) @property def initial_uri_obj(self) -> uris.WeaveURI: version = self._branch or self._version if version is None: raise errors.WeaveInternalError("Cannot get uri for unsaved artifact!") return WeaveLocalArtifactURI( self.name, version, ) --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/verify/verify.py source_type: code has_code: True def artifact_with_path_or_paths( name: str, verify_dir: Optional[str] = None, singular: bool = False ) -> "Artifact": art = wandb.Artifact(type="artsy", name=name) # internal file with open("verify_int_test.txt", "w") as f: f.write("test 1") f.close() art.add_file(f.name) if singular: return art if verify_dir is None: verify_dir = "./" with art.new_file("verify_a.txt") as f: f.write("test 2") if not os.path.exists(verify_dir): os.makedirs(verify_dir) with open(f"{verify_dir}/verify_1.txt", "w") as f: f.write("1") art.add_dir(verify_dir) file3 = Path(verify_dir) / "verify_3.txt" file3.write_text("3") # reference to local file art.add_reference(file3.resolve().as_uri()) return art --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/data_types/saved_model.py source_type: code has_code: True target_path = os.path.join( ".wb_data", "saved_models", os.path.basename(self._path) ) json_obj["path"] = artifact.add_file(self._path, target_path, True).path elif os.path.isdir(self._path): # If the path is a directory, then we need to add all of the files # The directory must be named deterministically based on the contents of the directory, # but the files themselves need to have their name preserved. # FUTURE: Add this functionality to the artifact adder itself json_obj["path"] = _add_deterministic_dir_to_artifact( artifact, self._path, os.path.join(".wb_data", "saved_models") ) else: raise ValueError( f"Expected a path to a file or directory, got {self._path}" ) --- source: https://docs.wandb.ai/guides/artifacts/track-external-files source_type: documentation has_code: True ``` import wandb run = wandb.init() artifact = run.use\_artifact("entity/project/mnist:latest", type="dataset") artifact\_dir = artifact.download() ``` For filesystem references, a `download()` operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of `/mount/datasets/mnist` will be copied into the directory `artifacts/mnist:v0/`. If an artifact contains a reference to a file that was overwritten, then `download()` will throw an error as the artifact can no longer be reconstructed. Putting everything together, here's a simple workflow you can use to track a dataset under a mounted filesystem that feeds into a training job: ``` import wandb run = wandb.init() artifact = wandb.Artifact("mnist", type="dataset") artifact.add\_reference("file:///mount/datasets/mnist/") # Track the artifact and mark it as an input to # this run in one swoop. A new artifact version # is only logged if the files under the directory # changed. run.use\_artifact(artifact) artifact\_dir = artifact.download() # Perform training here... ``` To track models, we can log the model artifact after the training script writes the model files to the mount point: ``` import wandb run = wandb.init() # Training here... # Write model to disk model\_artifact = wandb.Artifact("cnn", type="model") model\_artifact.add\_reference("file:///mount/cnn/my\_model.h5") run.log\_artifact(model\_artifact) ``` --- source: https://github.com/wandb/wandb/tree/main/tests/pytest_tests/system_tests/test_artifacts/test_artifact_cli.py source_type: code has_code: True import os import platform from wandb.cli import cli def test_artifact(runner, user): # wandb artifact put with open("artifact.txt", "w") as f: f.write("My Artifact") result = runner.invoke(cli.artifact, ["put", "artifact.txt", "-n", "test/simple"]) assert result.exit_code == 0 assert "Uploading file artifact.txt to:" in result.output assert "test/simple:v0" in result.output # wandb artifact ls result = runner.invoke(cli.artifact, ["ls", "test"]) assert result.exit_code == 0 assert "11.0B" in result.output assert "simple:v0" in result.output # wandb artifact get result = runner.invoke(cli.artifact, ["get", "test/simple:v0"]) assert result.exit_code == 0 assert "Downloading dataset artifact" in result.output path = os.path.join(".", "artifacts", "simple:v0") if platform.system() == "Windows": head, tail = os.path.splitdrive(path) path = head + tail.replace(":", "-") assert "Artifact downloaded to %s" % os.path.abspath(path) in result.output assert os.path.exists(path) --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/artifacts/storage_handlers/gcs_handler.py source_type: code has_code: True def store_path( self, artifact: "Artifact", path: Union[URIStr, FilePathStr], name: Optional[StrPath] = None, checksum: bool = True, max_objects: Optional[int] = None, ) -> Sequence[ArtifactManifestEntry]: self.init_gcs() assert self._client is not None # mypy: unwraps optionality # After parsing any query params / fragments for additional context, # such as version identifiers, pare down the path to just the bucket # and key. bucket, key, version = self._parse_uri(path) path = URIStr(f"{self._scheme}://{bucket}/{key}") max_objects = max_objects or DEFAULT_MAX_OBJECTS if not checksum: return [ArtifactManifestEntry(path=name or key, ref=path, digest=path)] --- source: https://github.com/wandb/weave/tree/master/weave/artifact_wandb.py source_type: code has_code: True # this used to be os.path.relpath but that called os.getcwd() every time # that turned out to be a bottleneck in production for artifacts with many # dir paths, so we use our own implementation that takes the cwd as input # and doesn't need to ever call os.getcwd() rel_path = util.relpath_no_syscalls(entry_path, path, cwd) rel_path_parts = rel_path.split("/") if len(rel_path_parts) == 1: files[rel_path_parts[0]] = artifact_fs.FilesystemArtifactFile( self, entry_path, ) else: dir_name = rel_path_parts[0] if dir_name not in sub_dirs: dir_ = file_base.SubDir(entry_path, 0, {}, {}) sub_dir_sizes[dir_name] = 0 sub_dirs[dir_name] = dir_ sub_dir_sizes[dir_name] += path_size --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/wandb_run.py source_type: code has_code: True Arguments: artifact_or_path: (str or Artifact) A path to the contents of this artifact, can be in the following forms: - `/local/directory` - `/local/directory/file.txt` - `s3://bucket/path` You can also pass an Artifact object created by calling `wandb.Artifact`. name: (str, optional) An artifact name. May be prefixed with entity/project. Valid names can be in the following forms: - name:version - name:alias - digest This will default to the basename of the path prepended with the current run id if not specified. type: (str) The type of artifact to log, examples include `dataset`, `model` aliases: (list, optional) Aliases to apply to this artifact, defaults to `["latest"]` --- source: https://github.com/wandb/weave/tree/master/weave/ops_domain/wb_util.py source_type: code has_code: True def escape_artifact_path(artifact_path: str) -> str: prefix = "wandb-client-artifact://" if artifact_path.startswith(prefix): artifact_path = artifact_path[len(prefix) :] if ":" in artifact_path: name, version_path = artifact_path.split(":", 1) version, path = version_path.split("/", 1) else: version = None name, path = artifact_path.split("/", 1) path = parse.quote(path, safe="") version_string = f":{version}" if version is not None else "" artifact_path = f"{prefix}{name}{version_string}/{path}" return artifact_path --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/artifacts/storage_handlers/s3_handler.py source_type: code has_code: True with cache_open(mode="wb") as f: obj.download_fileobj(f, ExtraArgs=extra_args) return path def store_path( self, artifact: "Artifact", path: Union[URIStr, FilePathStr], name: Optional[StrPath] = None, checksum: bool = True, max_objects: Optional[int] = None, ) -> Sequence[ArtifactManifestEntry]: self.init_boto() assert self._s3 is not None # mypy: unwraps optionality # The passed in path might have query string parameters. # We only need to care about a subset, like version, when # parsing. Once we have that, we can store the rest of the # metadata in the artifact entry itself. bucket, key, version = self._parse_uri(path) path = URIStr(f"{self._scheme}://{bucket}/{key}") max_objects = max_objects or DEFAULT_MAX_OBJECTS if not checksum: entry_path = name or (key if key != "" else bucket) return [ArtifactManifestEntry(path=entry_path, ref=path, digest=path)] --- source: https://github.com/wandb/wandb/tree/main/tests/pytest_tests/unit_tests/test_artifacts/test_wandb_artifacts.py source_type: code has_code: True def test_artifact_path_fn(): assert Artifact.path_contains_dir_prefix("/a/b/c", "/a/b") assert Artifact.path_contains_dir_prefix("a/b/c", "a/b") # Case 2: dir_path is identical to path assert Artifact.path_contains_dir_prefix("/a/b/c", "/a/b/c") assert Artifact.path_contains_dir_prefix("a/b/c", "a/b/c") # Case 3: dir_path is not a prefix of path assert not Artifact.path_contains_dir_prefix("/a/b/c", "/d/e") assert not Artifact.path_contains_dir_prefix("a/b/c", "d/e") # Case 4: Testing with trailing slashes assert Artifact.path_contains_dir_prefix("/a/b/c/", "/a/b") assert Artifact.path_contains_dir_prefix("a/b/c/", "a/b") # Case 5: Empty strings assert not Artifact.path_contains_dir_prefix("", "/a/b") assert Artifact.path_contains_dir_prefix("", "") # Case 6: Nested directories assert Artifact.path_contains_dir_prefix("/a/b/c/d", "/a/b") assert Artifact.path_contains_dir_prefix("a/b/c/d", "a/b") # Case 7: dir_path is a prefix but not a directory prefix assert not Artifact.path_contains_dir_prefix("/a/b/cd", "/a/b/c") assert not Artifact.path_contains_dir_prefix("a/b/cd", "a/b/c") --- source: https://github.com/wandb/wandb/tree/main/wandb/cli/cli.py source_type: code has_code: True def get(path, root, type): public_api = PublicApi() entity, project, artifact_name = public_api._parse_artifact_path(path) if project is None: project = click.prompt("Enter the name of the project you want to use") try: artifact_parts = artifact_name.split(":") if len(artifact_parts) > 1: version = artifact_parts[1] artifact_name = artifact_parts[0] else: version = "latest" full_path = f"{entity}/{project}/{artifact_name}:{version}" wandb.termlog( "Downloading {type} artifact {full_path}".format( type=type or "dataset", full_path=full_path ) ) artifact = public_api.artifact(full_path, type=type) path = artifact.download(root=root) wandb.termlog("Artifact downloaded to %s" % path) except ValueError: raise ClickException("Unable to download artifact") @artifact.command( context_settings=CONTEXT, help="List all artifacts in a wandb project" ) @click.argument("path") @click.option("--type", "-t", help="The type of artifacts to list") @display_error --- source: https://github.com/wandb/weave/tree/master/weave/artifact_local.py source_type: code has_code: True import contextlib import dataclasses import hashlib import os import json import typing import shutil from datetime import datetime import pathlib import tempfile from . import uris from . import util from . import errors from . import weave_types as types from . import artifact_wandb from . import artifact_fs from . import file_base from . import file_util from . import filesystem from . import environment WORKING_DIR_PREFIX = "__working__" def local_artifact_dir() -> str: d = os.path.join(filesystem.get_filesystem_dir(), "local-artifacts") os.makedirs(d, exist_ok=True) return d # From sdk/interface/artifacts.py def md5_hash_file(path): hash_md5 = hashlib.md5() with open(path, "rb") as f: for chunk in iter(lambda: f.read(64 * 1024), b""): hash_md5.update(chunk) return hash_md5.hexdigest() def md5_string(string: str) -> str: hash_md5 = hashlib.md5() hash_md5.update(string.encode()) return hash_md5.hexdigest() def local_artifact_exists(name: str, branch: str) -> bool: return os.path.exists(os.path.join(local_artifact_dir(), name, branch)) --- source: https://docs.wandb.ai/ref/python/artifact source_type: documentation has_code: True View source ``` get\_added\_local\_path\_name( local\_path: str ) -> Optional[str] ``` Get the artifact relative name of a file added by a local filesystem path. | Arguments | | | --- | --- | | `local_path` | The local path to resolve into an artifact relative name. | | Returns | | | --- | --- | | The artifact relative name. | | Basic usage: ``` artifact = wandb.Artifact("my\_dataset", type="dataset") artifact.add\_file("path/to/file.txt", name="artifact/path/file.txt") # Returns `artifact/path/file.txt`: name = artifact.get\_added\_local\_path\_name("path/to/file.txt") ``` View source ``` get\_entry( name: StrPath ) -> ArtifactManifestEntry ``` Get the entry with the given name. | Arguments | | | --- | --- | | `name` | The artifact relative name to get | | Raises | | | --- | --- | | `ArtifactNotLoggedError` | if the artifact isn't logged or the run is offline | | `KeyError` | if the artifact doesn't contain an entry with the given name | Basic usage: ``` # Run logging the artifact with wandb.init() as r: artifact = wandb.Artifact("my\_dataset", type="dataset") artifact.add\_file("path/to/file.txt") wandb.log\_artifact(artifact) # Run using the artifact with wandb.init() as r: artifact = r.use\_artifact("my\_dataset:latest") entry = artifact.get\_entry("file.txt") # Can now download 'file.txt' directly: entry.download() ``` --- source: https://github.com/wandb/weave/tree/master/weave/artifact_wandb.py source_type: code has_code: True def _path_info( self, path: str ) -> typing.Optional[ typing.Union[ "artifact_fs.FilesystemArtifactFile", "artifact_fs.FilesystemArtifactDir", "artifact_fs.FilesystemArtifactRef", ] ]: manifest = self._manifest() if manifest is None: return None manifest_entry = manifest.get_entry_by_path(path) if manifest_entry is not None: # This is not a WeaveURI! Its the artifact reference style used # by the W&B Artifacts/media layer. ref_prefix = "wandb-artifact://" ref = manifest_entry.get("ref") if ref and ref.startswith(ref_prefix): # This is a reference to another artifact art_id, target_path = ref[len(ref_prefix) :].split("/", 1) art = get_wandb_read_client_artifact(art_id) # this should be None when the requested artifact is deleted from the server. # we want to return None in this case so that the caller can handle it. **Question**: # Query how can I get the path to where an artifact is being written? --- # Query Metadata Language: en Intents: Product Features: The user is asking for guidance on how to retrieve the location of an artifact within the Weights & Biases framework, which involves understanding the functionality of the Weights & Biases API or CLI. The query is related to a feature of Weights & Biases such as Sweeps, Artifacts, Reports, Experiments, Tables, Prompts, Launch, Weave, StreamTables and more. Provide a link to the relevant documentation and explain the feature in detail Sub-queries to consider answering: How to find the storage location of an artifact using the Weights & Biases API? What CLI command is used to get the path of an artifact in Weights & Biases? Is there a method to access the directory where an artifact is stored in Weights & Biases?

question

how can I get the path to where an artifact is being written?

answer

To get the path to where an artifact is being written in Weights & Biases, you can use the `path` method provided in the Weights & Biases API. This method is part of the artifact handling functionality and allows you to retrieve the storage location of an artifact. Here's a basic example of how you might use this in a Python script: ```python import wandb # Initialize a W&B run run = wandb.init() # Assuming you have an artifact named 'my_artifact' artifact = run.use_artifact('entity/project/my_artifact:version') # Use the .download() method to get the local path to the artifact's directory artifact_dir = artifact.download() print(f"Artifact is stored at: {artifact_dir}") ``` This code snippet demonstrates how to use an artifact by its name and version (or alias like `latest`). The `download()` method fetches the artifact to your local environment and returns the path to the directory where the artifact is stored. This is useful when you need to access the files contained within an artifact programmatically. For CLI usage, you can download an artifact using the `wandb artifact get` command, which also downloads the artifact to your local machine: ```sh wandb artifact get entity/project/artifact_name:version ``` After downloading, the CLI command will print the path to where the artifact has been stored locally. Remember, the path obtained points to a local copy of the artifact's contents. If you're working with artifacts programmatically, ensure your script or application handles the local file paths correctly, especially if you're moving files or directories around. For more advanced usage, such as accessing artifacts within the code without downloading, you might need to explore the Weights & Biases SDK further, particularly methods related to artifact handling and file streaming.

model

gpt-4-0125-preview

sources

https://github.com/wandb/weave/tree/master/weave/artifact_local.py https://github.com/wandb/wandb/tree/main/wandb/sdk/verify/verify.py https://github.com/wandb/wandb/tree/main/wandb/sdk/data_types/saved_model.py https://docs.wandb.ai/guides/artifacts/track-external-files https://github.com/wandb/wandb/tree/main/tests/pytest_tests/system_tests/test_artifacts/test_artifact_cli.py https://github.com/wandb/wandb/tree/main/wandb/sdk/artifacts/storage_handlers/gcs_handler.py https://github.com/wandb/weave/tree/master/weave/artifact_wandb.py https://github.com/wandb/wandb/tree/main/wandb/sdk/wandb_run.py https://github.com/wandb/weave/tree/master/weave/ops_domain/wb_util.py https://github.com/wandb/wandb/tree/main/wandb/sdk/artifacts/storage_handlers/s3_handler.py https://github.com/wandb/wandb/tree/main/tests/pytest_tests/unit_tests/test_artifacts/test_wandb_artifacts.py https://github.com/wandb/wandb/tree/main/wandb/cli/cli.py https://github.com/wandb/weave/tree/master/weave/artifact_local.py https://docs.wandb.ai/ref/python/artifact https://github.com/wandb/weave/tree/master/weave/artifact_wandb.py

source_documents

source: https://github.com/wandb/weave/tree/master/weave/artifact_local.py source_type: code has_code: True def _get_write_path(self, path: str) -> pathlib.Path: write_dirname = pathlib.Path(self._write_dirname) full_path = write_dirname / path if ( not pathlib.Path(full_path) .resolve() .is_relative_to(write_dirname.resolve()) ): raise errors.WeaveAccessDeniedError() return full_path def direct_url(self, name: str) -> str: art_path = self.path(name) local_path = os.path.abspath(art_path) return f"{environment.weave_server_url()}/__weave/file{local_path}" def path(self, name: str) -> str: return str(self._get_read_path(name)) @property def initial_uri_obj(self) -> uris.WeaveURI: version = self._branch or self._version if version is None: raise errors.WeaveInternalError("Cannot get uri for unsaved artifact!") return WeaveLocalArtifactURI( self.name, version, ) --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/verify/verify.py source_type: code has_code: True def artifact_with_path_or_paths( name: str, verify_dir: Optional[str] = None, singular: bool = False ) -> "Artifact": art = wandb.Artifact(type="artsy", name=name) # internal file with open("verify_int_test.txt", "w") as f: f.write("test 1") f.close() art.add_file(f.name) if singular: return art if verify_dir is None: verify_dir = "./" with art.new_file("verify_a.txt") as f: f.write("test 2") if not os.path.exists(verify_dir): os.makedirs(verify_dir) with open(f"{verify_dir}/verify_1.txt", "w") as f: f.write("1") art.add_dir(verify_dir) file3 = Path(verify_dir) / "verify_3.txt" file3.write_text("3") # reference to local file art.add_reference(file3.resolve().as_uri()) return art --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/data_types/saved_model.py source_type: code has_code: True target_path = os.path.join( ".wb_data", "saved_models", os.path.basename(self._path) ) json_obj["path"] = artifact.add_file(self._path, target_path, True).path elif os.path.isdir(self._path): # If the path is a directory, then we need to add all of the files # The directory must be named deterministically based on the contents of the directory, # but the files themselves need to have their name preserved. # FUTURE: Add this functionality to the artifact adder itself json_obj["path"] = _add_deterministic_dir_to_artifact( artifact, self._path, os.path.join(".wb_data", "saved_models") ) else: raise ValueError( f"Expected a path to a file or directory, got {self._path}" ) --- source: https://docs.wandb.ai/guides/artifacts/track-external-files source_type: documentation has_code: True ``` import wandb run = wandb.init() artifact = run.use\_artifact("entity/project/mnist:latest", type="dataset") artifact\_dir = artifact.download() ``` For filesystem references, a `download()` operation copies the files from the referenced paths to construct the artifact directory. In the above example, the contents of `/mount/datasets/mnist` will be copied into the directory `artifacts/mnist:v0/`. If an artifact contains a reference to a file that was overwritten, then `download()` will throw an error as the artifact can no longer be reconstructed. Putting everything together, here's a simple workflow you can use to track a dataset under a mounted filesystem that feeds into a training job: ``` import wandb run = wandb.init() artifact = wandb.Artifact("mnist", type="dataset") artifact.add\_reference("file:///mount/datasets/mnist/") # Track the artifact and mark it as an input to # this run in one swoop. A new artifact version # is only logged if the files under the directory # changed. run.use\_artifact(artifact) artifact\_dir = artifact.download() # Perform training here... ``` To track models, we can log the model artifact after the training script writes the model files to the mount point: ``` import wandb run = wandb.init() # Training here... # Write model to disk model\_artifact = wandb.Artifact("cnn", type="model") model\_artifact.add\_reference("file:///mount/cnn/my\_model.h5") run.log\_artifact(model\_artifact) ``` --- source: https://github.com/wandb/wandb/tree/main/tests/pytest_tests/system_tests/test_artifacts/test_artifact_cli.py source_type: code has_code: True import os import platform from wandb.cli import cli def test_artifact(runner, user): # wandb artifact put with open("artifact.txt", "w") as f: f.write("My Artifact") result = runner.invoke(cli.artifact, ["put", "artifact.txt", "-n", "test/simple"]) assert result.exit_code == 0 assert "Uploading file artifact.txt to:" in result.output assert "test/simple:v0" in result.output # wandb artifact ls result = runner.invoke(cli.artifact, ["ls", "test"]) assert result.exit_code == 0 assert "11.0B" in result.output assert "simple:v0" in result.output # wandb artifact get result = runner.invoke(cli.artifact, ["get", "test/simple:v0"]) assert result.exit_code == 0 assert "Downloading dataset artifact" in result.output path = os.path.join(".", "artifacts", "simple:v0") if platform.system() == "Windows": head, tail = os.path.splitdrive(path) path = head + tail.replace(":", "-") assert "Artifact downloaded to %s" % os.path.abspath(path) in result.output assert os.path.exists(path) --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/artifacts/storage_handlers/gcs_handler.py source_type: code has_code: True def store_path( self, artifact: "Artifact", path: Union[URIStr, FilePathStr], name: Optional[StrPath] = None, checksum: bool = True, max_objects: Optional[int] = None, ) -> Sequence[ArtifactManifestEntry]: self.init_gcs() assert self._client is not None # mypy: unwraps optionality # After parsing any query params / fragments for additional context, # such as version identifiers, pare down the path to just the bucket # and key. bucket, key, version = self._parse_uri(path) path = URIStr(f"{self._scheme}://{bucket}/{key}") max_objects = max_objects or DEFAULT_MAX_OBJECTS if not checksum: return [ArtifactManifestEntry(path=name or key, ref=path, digest=path)] --- source: https://github.com/wandb/weave/tree/master/weave/artifact_wandb.py source_type: code has_code: True # this used to be os.path.relpath but that called os.getcwd() every time # that turned out to be a bottleneck in production for artifacts with many # dir paths, so we use our own implementation that takes the cwd as input # and doesn't need to ever call os.getcwd() rel_path = util.relpath_no_syscalls(entry_path, path, cwd) rel_path_parts = rel_path.split("/") if len(rel_path_parts) == 1: files[rel_path_parts[0]] = artifact_fs.FilesystemArtifactFile( self, entry_path, ) else: dir_name = rel_path_parts[0] if dir_name not in sub_dirs: dir_ = file_base.SubDir(entry_path, 0, {}, {}) sub_dir_sizes[dir_name] = 0 sub_dirs[dir_name] = dir_ sub_dir_sizes[dir_name] += path_size --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/wandb_run.py source_type: code has_code: True Arguments: artifact_or_path: (str or Artifact) A path to the contents of this artifact, can be in the following forms: - `/local/directory` - `/local/directory/file.txt` - `s3://bucket/path` You can also pass an Artifact object created by calling `wandb.Artifact`. name: (str, optional) An artifact name. May be prefixed with entity/project. Valid names can be in the following forms: - name:version - name:alias - digest This will default to the basename of the path prepended with the current run id if not specified. type: (str) The type of artifact to log, examples include `dataset`, `model` aliases: (list, optional) Aliases to apply to this artifact, defaults to `["latest"]` --- source: https://github.com/wandb/weave/tree/master/weave/ops_domain/wb_util.py source_type: code has_code: True def escape_artifact_path(artifact_path: str) -> str: prefix = "wandb-client-artifact://" if artifact_path.startswith(prefix): artifact_path = artifact_path[len(prefix) :] if ":" in artifact_path: name, version_path = artifact_path.split(":", 1) version, path = version_path.split("/", 1) else: version = None name, path = artifact_path.split("/", 1) path = parse.quote(path, safe="") version_string = f":{version}" if version is not None else "" artifact_path = f"{prefix}{name}{version_string}/{path}" return artifact_path --- source: https://github.com/wandb/wandb/tree/main/wandb/sdk/artifacts/storage_handlers/s3_handler.py source_type: code has_code: True with cache_open(mode="wb") as f: obj.download_fileobj(f, ExtraArgs=extra_args) return path def store_path( self, artifact: "Artifact", path: Union[URIStr, FilePathStr], name: Optional[StrPath] = None, checksum: bool = True, max_objects: Optional[int] = None, ) -> Sequence[ArtifactManifestEntry]: self.init_boto() assert self._s3 is not None # mypy: unwraps optionality # The passed in path might have query string parameters. # We only need to care about a subset, like version, when # parsing. Once we have that, we can store the rest of the # metadata in the artifact entry itself. bucket, key, version = self._parse_uri(path) path = URIStr(f"{self._scheme}://{bucket}/{key}") max_objects = max_objects or DEFAULT_MAX_OBJECTS if not checksum: entry_path = name or (key if key != "" else bucket) return [ArtifactManifestEntry(path=entry_path, ref=path, digest=path)] --- source: https://github.com/wandb/wandb/tree/main/tests/pytest_tests/unit_tests/test_artifacts/test_wandb_artifacts.py source_type: code has_code: True def test_artifact_path_fn(): assert Artifact.path_contains_dir_prefix("/a/b/c", "/a/b") assert Artifact.path_contains_dir_prefix("a/b/c", "a/b") # Case 2: dir_path is identical to path assert Artifact.path_contains_dir_prefix("/a/b/c", "/a/b/c") assert Artifact.path_contains_dir_prefix("a/b/c", "a/b/c") # Case 3: dir_path is not a prefix of path assert not Artifact.path_contains_dir_prefix("/a/b/c", "/d/e") assert not Artifact.path_contains_dir_prefix("a/b/c", "d/e") # Case 4: Testing with trailing slashes assert Artifact.path_contains_dir_prefix("/a/b/c/", "/a/b") assert Artifact.path_contains_dir_prefix("a/b/c/", "a/b") # Case 5: Empty strings assert not Artifact.path_contains_dir_prefix("", "/a/b") assert Artifact.path_contains_dir_prefix("", "") # Case 6: Nested directories assert Artifact.path_contains_dir_prefix("/a/b/c/d", "/a/b") assert Artifact.path_contains_dir_prefix("a/b/c/d", "a/b") # Case 7: dir_path is a prefix but not a directory prefix assert not Artifact.path_contains_dir_prefix("/a/b/cd", "/a/b/c") assert not Artifact.path_contains_dir_prefix("a/b/cd", "a/b/c") --- source: https://github.com/wandb/wandb/tree/main/wandb/cli/cli.py source_type: code has_code: True def get(path, root, type): public_api = PublicApi() entity, project, artifact_name = public_api._parse_artifact_path(path) if project is None: project = click.prompt("Enter the name of the project you want to use") try: artifact_parts = artifact_name.split(":") if len(artifact_parts) > 1: version = artifact_parts[1] artifact_name = artifact_parts[0] else: version = "latest" full_path = f"{entity}/{project}/{artifact_name}:{version}" wandb.termlog( "Downloading {type} artifact {full_path}".format( type=type or "dataset", full_path=full_path ) ) artifact = public_api.artifact(full_path, type=type) path = artifact.download(root=root) wandb.termlog("Artifact downloaded to %s" % path) except ValueError: raise ClickException("Unable to download artifact") @artifact.command( context_settings=CONTEXT, help="List all artifacts in a wandb project" ) @click.argument("path") @click.option("--type", "-t", help="The type of artifacts to list") @display_error --- source: https://github.com/wandb/weave/tree/master/weave/artifact_local.py source_type: code has_code: True import contextlib import dataclasses import hashlib import os import json import typing import shutil from datetime import datetime import pathlib import tempfile from . import uris from . import util from . import errors from . import weave_types as types from . import artifact_wandb from . import artifact_fs from . import file_base from . import file_util from . import filesystem from . import environment WORKING_DIR_PREFIX = "__working__" def local_artifact_dir() -> str: d = os.path.join(filesystem.get_filesystem_dir(), "local-artifacts") os.makedirs(d, exist_ok=True) return d # From sdk/interface/artifacts.py def md5_hash_file(path): hash_md5 = hashlib.md5() with open(path, "rb") as f: for chunk in iter(lambda: f.read(64 * 1024), b""): hash_md5.update(chunk) return hash_md5.hexdigest() def md5_string(string: str) -> str: hash_md5 = hashlib.md5() hash_md5.update(string.encode()) return hash_md5.hexdigest() def local_artifact_exists(name: str, branch: str) -> bool: return os.path.exists(os.path.join(local_artifact_dir(), name, branch)) --- source: https://docs.wandb.ai/ref/python/artifact source_type: documentation has_code: True View source ``` get\_added\_local\_path\_name( local\_path: str ) -> Optional[str] ``` Get the artifact relative name of a file added by a local filesystem path. | Arguments | | | --- | --- | | `local_path` | The local path to resolve into an artifact relative name. | | Returns | | | --- | --- | | The artifact relative name. | | Basic usage: ``` artifact = wandb.Artifact("my\_dataset", type="dataset") artifact.add\_file("path/to/file.txt", name="artifact/path/file.txt") # Returns `artifact/path/file.txt`: name = artifact.get\_added\_local\_path\_name("path/to/file.txt") ``` View source ``` get\_entry( name: StrPath ) -> ArtifactManifestEntry ``` Get the entry with the given name. | Arguments | | | --- | --- | | `name` | The artifact relative name to get | | Raises | | | --- | --- | | `ArtifactNotLoggedError` | if the artifact isn't logged or the run is offline | | `KeyError` | if the artifact doesn't contain an entry with the given name | Basic usage: ``` # Run logging the artifact with wandb.init() as r: artifact = wandb.Artifact("my\_dataset", type="dataset") artifact.add\_file("path/to/file.txt") wandb.log\_artifact(artifact) # Run using the artifact with wandb.init() as r: artifact = r.use\_artifact("my\_dataset:latest") entry = artifact.get\_entry("file.txt") # Can now download 'file.txt' directly: entry.download() ``` --- source: https://github.com/wandb/weave/tree/master/weave/artifact_wandb.py source_type: code has_code: True def _path_info( self, path: str ) -> typing.Optional[ typing.Union[ "artifact_fs.FilesystemArtifactFile", "artifact_fs.FilesystemArtifactDir", "artifact_fs.FilesystemArtifactRef", ] ]: manifest = self._manifest() if manifest is None: return None manifest_entry = manifest.get_entry_by_path(path) if manifest_entry is not None: # This is not a WeaveURI! Its the artifact reference style used # by the W&B Artifacts/media layer. ref_prefix = "wandb-artifact://" ref = manifest_entry.get("ref") if ref and ref.startswith(ref_prefix): # This is a reference to another artifact art_id, target_path = ref[len(ref_prefix) :].split("/", 1) art = get_wandb_read_client_artifact(art_id) # this should be None when the requested artifact is deleted from the server. # we want to return None in this case so that the caller can handle it.

total_tokens

7,890

prompt_tokens

7,269

completion_tokens

621

time_taken

27.416927

start_time

datetime.datetime(2024, 9, 6, 11, 47, 34, 88432, tzinfo=datetime.timezone.utc)

end_time

datetime.datetime(2024, 9, 6, 11, 48, 1, 506512, tzinfo=datetime.timezone.utc)