We provide IT Staff Augmentation Services!

Sr Data Architect Resume

0/5 (Submit Your Rating)

Venice, FL

SUMMARY

  • Dynamic and motivated IT professional with around 9+ years of experience in field of Big Data.
  • Expertise in designing data intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data engineering, Data Modeling, Data Warehouse / Data Mart, Data Visualization, Reporting, and Data Quality solutions
  • Experience with Big Data Technologies such as Amazon Web Services (AWS), Microsoft Azure, GCP, Databricks, Kafka, Spark, Hive, Sqoop and Hadoop
  • Hands - on experience with Amazon EC2, Amazon S3, Amazon RDS, IAM, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SES, SQS, Lambda, EMR and other services of the AWS family.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Yarn and MapReduce concepts
  • Understanding of Delta Lake architecture, including Delta Lake tables, transactions, schema enforcement, and time-traveling capabilities
  • Hands-on experience in working with Data Warehouses and designed performant models.
  • Expertise with NoSQL databases such as HBase and Cassandra,
  • Well versed with spark performance tuning at the source, target, and data stage job levels using indexes, hints, and partitioning.
  • Worked on data governance and data quality.
  • Utilized PL/SQL and SQL to create queries and develop Python-based designs and programs.
  • Optimized existing data processes to increase efficiency and performance.
  • Experienced in designing and developing data pipelines using GCP data services such as BigQuery, Cloud DataFlow, Cloud Dataproc, and Cloud Composer.
  • Experience with Streaming Processing using PySpark and Kafka.
  • Preparation of test cases, documenting and performing unit testing and Integration.
  • Automatically detected and ingest new files into a data lake using autoloader.
  • Experienced in creating Azure Data Factory pipelines, datasets, and linked services
  • Proven experience in developing and deploying data engineering solutions on the Databricks.
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Demonstrated knowledge in Automation Testing, SDLC utilizing Waterfall and Agile Model.
  • Utilized knowledge in creating tailored reports that extracted data and used by reporting tools like Tableau, PowerBI, and AWS QuickSight.
  • Skilled in setting up Airflow production environment with high availability
  • Utilized containerization technologies such as Docker and Kubernetes to build and deploy applications
  • Developed scripts and automation tools to improve development and deployment processes

TECHNICAL SKILLS

Programming Languages: Python, Scala,Java, C, C++, JavaScript, SQL

Databases: MS SQL Server, Oracle, DB2, MySQL, PostgreSQL.

NoSQL Databases: HBase, Cassandra, MongoDB.

Cloud Platforms: AWS, MS Azure, GCP

Big Data Primary Skills: MapReduce, Sqoop, Hive, Kafka, Spark, Cloudera, Databricks, Zookeeper

CICD: GitHub, Docker, Jenkins, Terraform and Kubernetes.

Orchestration Tools: Airflow, Oozie, Cloud Composer, AWS MWAA

Data Analytics: Tableau, MS PowerBI, Amazon QuickSight.

Operating Systems: UNIX/Linux, Windows

Cloud Services: AWS S3, EMR, Lambda Functions, Step Functions, Redshift Spectrum, Redshift, RDS, Quicksight, DynamoDB, CloudFormation, CloudWatch, SNS, SES, SQS. Azure Data Factory, Azure Databricks, Azure Data Lake Gen2, Azure SQL, Azure HDInsight, GCP, DataProc, BigQuery, Dataflow, Cloud Composer.

Testing Tools: PyTest, Selenium, ScalaTest.

PROFESSIONAL EXPERIENCE

Sr Data Architect

Confidential, Venice, FL

Responsibilities:

  • Teamed with 5 developers to execute APIs that enabled the analytics team to increase reporting speed from 20% to 30% in 2 weeks.
  • Automated ETL processes across billions of rows of data, which saved 45 hours of manual hours per month.
  • Read CSV and JSON files from Google Cloud Storage in GCP to get the information required for the client and partners using lambda functions on an event driven architecture.
  • Deposit clean parquet files into the Google Cloud Storage in GCP to provide the information for the partner and client.
  • Worked designing, loading and querying data in BigQuery, Cloud Storage, Cloud SQL and other GCP services.
  • Proficient in troubleshooting and resolving data-related issues on GCP.
  • ETL process in End-To-End Pipelines using python and GCP.
  • Data processing using Spark Jobs with PySpark in Databricks to clean the information and then store or integrate it into the Enterprise Data Warehouse in BigQuery.
  • Keep transactional tables updated with multiple processes related to the service made by the client or the payroll such as incidences, bonuses, benefits, employee registrations, or terminations.
  • Create multiple Apache Airflow jobs using python in Cloud Compose to orchestrate pipelines and synchronize with the Payroll Calendars.
  • Distribute data across specific pipelines related to the services of each client.
  • Built models of the data processing using the PySpark and Spark SQL in Databricks to generate insights for specific purposes.
  • Process and store parquet files in the Data Lake using GCS in GCP for easy access and analysis.
  • Data pipeline worked under an Agile/Scrum methodology using Jira to track tickets and project progress.
  • Create ETL pipelines to load data from multiple data sources to the Databricks Delta Lake in a multi-layer architecture.

Lead Data Engineer

Confidential, Los Angeles, CA

Responsibilities:

  • Oversaw a team of 5 data engineers, and collaborated with company management, recommend changes based on data history and tests
  • Developed PySpark application as ETL processes.
  • Developed Cloud-based Big Data Architecture using AWS.
  • Developed and maintained data pipeline, ingesting data across 12 disparate sources using Redshift, S3, and Python
  • Created HBase tables, loading with data and writing HBase queries to process data.
  • Established customer rapport through a recommended loyalty program that drove subscriptions up by 16%
  • Communicated with business departments to understand needs and requests in order to build data pipelines for analyzing technical issues
  • Created Hive and SQL queries to spot emerging trends by comparing data with historical metrics.
  • Created a cluster of Kafka brokers to retrieve structured data in structured streaming.
  • Designed, developed, and tested Spark SQL clients with PySpark.
  • Established collection of data using REST API, built HTTPS connection with client-server, sent GET request, and collected response in Kafka producer.
  • Used Spark to parse out needed data by using Spark SQL Context and select features with target information and assigned names.
  • Decoded raw data from JSON and streamed it using the Kafka producer API.
  • Integrated Kafka with Spark Streaming for real-time data processing using structured streaming.
  • Conducted exploratory data analysis and management dashboard for weekly report.
  • Utilized transformations and actions in Spark to interact with data frames to show and process data.
  • Hands-on with Spark Core, Spark SQL, and Data Frames/Data Sets/RDD API.
  • Split JSON files into DataFrames to be processed in parallel for better performance and fault tolerance.

Sr Data Engineer

Confidential, Providence, RI

Responsibilities:

  • Oversaw the migration from Oracle to Redshift, saving $250,000 with a performance increase of 14%
  • Performed ETL on the streaming as well as batch data using PySpark and loaded the data into the Amazon S3 Buckets.
  • Used AWS Kinesis and Kinesis firehose for processing real time streaming data and store the data into S3 Bucket.
  • Integrated Glue with Kinesis for serverless processing of streaming data
  • Used the AWS SDK in Python and Boto3 to write Kinesis producers.
  • Create test environments on various Amazon EC2 instances,
  • Used the AWS console to manage and monitor EMR and different applications running on it.
  • Created different IAM roles to manage permissions in AWS.
  • Created AWS Cloud Formation templates used for Terraform with existing plugins.
  • Developed AWS Cloud Formation templates to create the custom infrastructure of the pipeline.
  • Developed multiple PySpark Streaming and batch Spark jobs on EMR
  • Implemented AWS Lambda functions to run scripts in response to events in Amazon Dynamo DB table or S3.
  • Decoded raw data and loaded it into JSON before sending the batched streaming file over the Kafka Producer.
  • Specified nodes and performed data analysis queries on Amazon Redshift Clusters on AWS.
  • Hands-on with AWS Kinesis for processing excessively large amounts of real-time data.
  • Populated database tables via AWS Kinesis Firehose and AWS Redshift.
  • Utilized a cluster of three brokers to handle replication needs and allow for fault tolerance.
  • Used EC2, ECS, ECR, AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, and AWS CodeDeploy to help teams automate the process of developing, testing, and deploying applications.

Sr Data Engineer

Confidential, Zeeland, MI

Responsibilities:

  • Worked with Spark in Azure Databricks and configured with external libraries to develop multiple data pipelines.
  • Created Data Pipelines where used Azure Data Factory to get data from SQL Server and load the data into Azure Data Lake.
  • Process the data using Databricks based on the requirements and loaded processed data into azure Synapse DB for Business Intelligence.
  • Processed the data flows and steaming using spark in HD Insight, and then stored the results in Data Lake.
  • Wrote SQL queries in Azure Synapse for analyzing data in the Enterprise Data Warehouse and creating Stored procedures.
  • Migrated on-premises architecture to Azure services.
  • Optimized data ingestion in Kafka Brokers within the Kafka cluster by partitioning Kafka Topics.
  • Consumed requests from REST-Based API from a Python script to a Kafka Producer.
  • Performed Data scrubbing and processing with Azure Data Factory and Databricks jobs API.
  • Provided connections to different Business Intelligence tools to the tables in the data warehouse such as Databricks and Power BI.
  • Created stored procedures in Azure Synapse to process data in the Enterprise Datawarehouse and orchestrated them using Azure Data Factory.
  • Improved and made bug corrections to Azure HDInsight's already-built pipelines and flows.
  • Designed and implemented data validation and quality control processes, utilizing Scala and Spark's built-in functionality.
  • Modeled schema for Azure Synapse data warehouse.
  • Created and maintained clusters for Kafka, Hadoop, and Spark in HDInsight.

DBA

Confidential, Fort Mill, SC

Responsibilities:

  • Created Multi Node Hadoop cluster setup in pseudo-distributed mode.
  • Installed, configured, monitored, and administered Linux servers.
  • Worked with the DBA team for database performance issues, network-related issues on LINUX/UNIX servers, and with vendors regarding hardware-related issues.
  • Monitored CPU, memory, hardware, and software including raid, physical disk, multipath, filesystems, and networks using the Nagios monitoring tool.
  • Automated daily tasks using bash scripts while documenting the changes in the environment and each server, analyzing the error logs, user logs, and /var/log messages.
  • Created and modified users and groups with root permissions.
  • Administered local and remote servers using SSH daily.
  • Made sure the Namenode and Secondary Namenode of Hadoop cluster are healthy and available.
  • Created and maintained Python scripts for automating build and deployment processes.
  • Created users, managed user permissions, maintained user and file system quotas, and installed and configured DNS.
  • Adhered to industry standards by securing systems, directory and file permissions, groups, and supporting user account management along with the creation of users.
  • Performed kernel and database configuration optimization such as I/O resource usage on disks.
  • Analyzed and monitored log files to troubleshoot issues.

We'd love your feedback!