Data Engineer/Data Analyst Resume New Jersey - Hire IT People

SUMMARY

7 years of experience as a Data Engineer, Data Analyst using Python, NumPy, Pandas, AWS, Postgres, Kafka, Cassandra, MongoDB.
Hands on experience using Panda’s data frames, NumPy, matplotlib and seaborn to create correlation, bar, time series plots.
Skilled in Tableau desktop for creating visualizations using bar charts, line charts, scatter plots, pie charts etc.
Have good experience in conducting exploratory data analysis, data mining and working with Statistical models.
Worked with python and Bash scripting in automating tasks and creating data pipelines.
Good Experience in creating and designing data ingestion pipelines using Apache Kafka.
Data preprocessing techniques like handling missing data, categorical data (dummy variable handling), feature scaling.
Good knowledge on data modelling and creating schemas (Snowflake/ Star), creating tables (fact/ dimension) in data warehouse. Experienced with 3NF, Normalization and Denormalization of tables depending on use case Scenario.
Well versed with Big data on AWS cloud services i.e., EC2, S3, Glue, Athena, DynamoDB and RedShift.
Experienced with performing Query optimization in MySQL, Teradata using explain commands to increase Query performance.
Proficiency in working with SQL/ NoSQL like MongoDB, Cassandra, MySQL and PostgreSQL.
Exposure to various AWS services like Lambda, VPC, IAM, Load Balancing, CloudWatch, SNS, SQS, Autoscaling, Load Balancing.
Good experience with data pipeline/workflow management tools like AWS Kinesis, Apache Airflow.
Experienced in performing ETL operations using Apache Airflow DAG, Informatica Power center to load data into Data Warehouse.
Have Knowledge on container Orchestration platform like Kubernetes and building images with Docker and deploying on private registry.
Experienced with full software development life cycle, architecting scalable platforms, object - oriented programming, database design and Kanban methodologies.
Experience in UNIX shell scripting for processing large volumes of data from varied sources and loading into databases like Teradata.
Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MongoDB, MySQL, and PostgreSQL database.
Thorough understanding of providing specifications using Waterfall and Agile Software methodology to modelling systems and business processes.
Have good experience in conducting exploratory data analysis, data mining and working with Statistical models.
Experience in Design and Development of ETL methodology for supporting Data Migration, data transformations &processing in a corporate wide ETL Solution using Teradata.
Worked on various applications usingpythonintegrated IDEs like Jupyter, Spider, Eclipse, VSCode, IntelliJ, ATOM and PyCharm.

TECHNICAL SKILLS

Libraries: Keras, TensorFlow, GYM, scikit, matplotlib, seaborn, NumPy, Pandas, Boto3, Beautiful Soup, PySpark, Gurobi, scikit.

SQL/NoSQL: PostgreSQL, MongoDB, Cassandra, MySQL, MS SQL, Kafka

Language: Python, R, C, C++.

Operating System: Windows, Red Hat Linux

Version Control: Git, GitHub, SVN

Architecture: Relational DBMS, OLAP, OLTP.

Reporting Tools: Power-BI, Tableau, SSRS (SQL Server Reporting Services)

ETL Tools: Apache Airflow, Informatica, SSIS (SQL Server Integration Services)

PROFESSIONAL EXPERIENCE

Confidential - New Jersey

Data Engineer/Data Analyst

Responsibilities:

Utilized AWS EMR with spark to perform batch processing operations on various Big Data sources.
Installed the applications on AWS EC2 instances and configured on the storage S3 buckets.
Created data pipeline to extract data from various API and ingested them in S3 Buckets using Apache Airflow.
Deployed Lambda functions to be triggered on certain events in S3 Bucket to perform data transformation and load the data in AWS Redshift.
Worked with Linux EC2 instances and RDBMS databases.
Developed python and shell scripts to perform spark jobs for batch processing data from various sources.
Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) to store objects.
Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.
Extracted data using spark from AWS redshift and performed data Analysis.
Developed POC to perform ETL operations using AWS glue to load Kinesis stream data into S3 buckets.
Performed cleaning of data, data quality checks, data governance for incremental loads.
Normalized data and created correlation plots, scatter plots to find underlying patterns.
Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems.
Testing dashboards, to ensuredatais matching as per the business requirements and if there are any discrepancies inunderlying data.
Created data pipelines using Step functions, implemented state machines and ran pipelines at different times to process data.
Develop and implement databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and quality.
Work with management to prioritize business and information needs.

Environment: Python, AWS, Jira, GIT, CI/CD, Docker, Kubernetes, Web Services, Spark streaming API, Kafka, Cassandra, Python, MongoDB, JSON, Bash scripting, Linux, SQL, Apache Airflow.

Confidential - Sioux Falls, SD

Data Engineer

Responsibilities:

Implemented Spark using Python and utilized Data frames and Spark SQL API for processing and querying data.
Developed Spark Applications by using python Driver and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
Worked with various unstructured data sources like parquet, json, csv etc. using spark.
Experienced in working real time streaming with Kafka as data pipeline using spark streaming module.
Consumed Kafka messages and loaded data into Cassandra cluster deployed in containers.
Developed preprocessing jobs using spark data frames to flatten json files.
Ingested data into Mysql RDBMS data and performed transformations and then export the transformed data to Cassandra.
Assisted in Key space creation, Table creation, Secondary Index creation in Cassandra database.
Performed Query optimization of the tables through load testing using Cassandra stress tool.
Created various POC using pyspark module in python using MLlib.
Worked with various teams in deploying containers on site using Kubernetes to run Cassandra clusters in Linux Environment.
Good Knowledge on Kubernetes architecture like scheduler, pods, nodes, kubectl api and etcd database.
Experience in creating tables in Cassandra clusters, building images using docker and deploying the images.
Assisted in developing and creating schemas, tables for Cassandra cluster to ensure good query performance for the front-end Application.
Experience in managing MongoDB environment from availability, performance and scalability perspectives.
Worked on the Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
Used GitHub as a version control.
Worked on the UNIX environment.

Environment: Python, Spark, Kafka, JSON, GitHub, LINUX, Flask, Varnish, Nginx, REST, CI CD, Kubernetes, Helm, MongoDB, Cassandra.

Confidential - Norfolk, VA

Data Engineer

Responsibilities:

Involved in developing python scripts, informatica and other ETL tools for extraction, transformation, loading of data into Teradata.
Worked with a Team of Business Analysts in requirements gathering, business analysis and project coordination in creating reports.
Performed Unit and Integration Testing of Informatica Sessions, Batches and Target Data.
Responsible for Using Autosys and workflow Manager Tools to schedule Informatica jobs.
Responsible for creating Workflows and sessions using Informatica workflow manager and monitor the workflow run and statistic properties on Informatica Workflow Monitor
Developing complex Informatica mappings using different types of transformations like Connected and Unconnected LOOKUP transformations, Router, Filter, Aggregator, Expression, Normalizer and Update strategy transformations for large volumes of Data.
Involving Testing, Debugging, Validation and Performance Tuning of data warehouse, help develop optimum solutions for data warehouse deliverables.
Moving the data from source systems to different schemas based on the dimensions and fact tables by using the slowly changing dimensions type two and type one.
Interacted with key users and assisted them with various data issues, understood data needs and assisted them with Data analysis.
Performed Query optimization for various SQL tables using Teradata explain command.
Developed procedures to populate the customer data warehouse with transaction data, cycle and monthly summer data.
Experienced in Linux environment and scheduling jobs, file transfers.
Very Good Understanding of Database skew, PPI, Join methods, aggregate and hash.

Environment: Informatica power center, Teradata, XML, Flat files, Cron Job, Linux, Bash shell, Python scripting.

Confidential

Data Analyst

Responsibilities:

Created data visualizations and developed dashboard, stories in Tableau.
Utilized various charts like scatterplot, bar, pie, heatmap in Tableau for data analysis.
Analyzed various KPI results frequently to assist in development of performance improvement concepts.
Experience in using Excel pivot chart, VBA tools to create customized reports and analysis.
Assisted with sizing query optimization, buffer tuning, backup and recovery, installations, upgrades and security including other administration functions as part of profiling plan.
Ensured production data being replicated into data warehouse without any data anomalies from the processing databases.
Designed databases for referential integrity and involved in logical design plan.
Analyzed code to improve query optimization and to verify that tables are using indexes.
Created, tested MySQL programming, forms, reports, triggers and procedures for the Data Warehouse.
Involved in troubleshooting and fine-tuning of databases for its performance and concurrency.
Automated the code release process, bringing the total time for code releases from 8 hours to 1 hour.
Developed a fully automated continuous integration system using Git, MySQL and custom tools developed in Python and Bash.
Working experience Tableau Desktop to generate the reports by writing the SQL queries.
Played a key role in a department wide transition from Subversion to Git, which resulted in an increase in efficiency for the development community.

Environment: Python, MSQL, Tableau, JSON, GitHub, LINUX.

We provide IT Staff Augmentation Services!

Data Engineer/data Analyst Resume

New, JerseY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship