Sr Data Engineer Resume Miami FL - Hire IT People

SUMMARY

10+years of IT experience with Data Engineer/Analyst and coding with analytical programming using SQL, Python, Snowflake and AWS.
Good working knowledge in multi - tiered distributed environment, good understanding of Software Development Lifecycle (SDLC)-Agile and Waterfall Methodologies
The ability to develop reliable, maintainable, efficient code in most of SQL, Hive and Python.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Management Systems (RDBMS) and from RDBMS to HDFS.
Strong knowledge of various data warehousing methodologies and data modeling concepts.
Experience in the development of ETL processes and frameworks for large-scale, complex datasets
Experience with application development on Linux, python, RDBMS, NoSQL and ETL solutions
Experience designing and operating very large Data Warehouses.
Experience in the development of ETL processes and frameworks like matillion for large-scale, complex datasets
Good knowledge in BI and data visualization tools such asTableau
Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig.
Gained Knowledge to write cloud formation templates and deployed AWS resourcing.
Design and build data processing pipelines using tools and frameworks.
Understanding data transformation and translation requirements and which tools to leverage to get the job done.
Understanding data pipelines and modern ways of automating data pipeline using cloud-based technique.
Good understanding and hands on experience in setting up and maintaining NoSQL Databases like MongoDB, and HBase
Familiarity with new advances in the data engineering space such as EMR and NoSQL technologies like Dynamo DB.
Good knowledge in data visualization tools such asTableau
Ability to work on several python packages like Pandas, NumPy, Matplotlib, Beautiful Soup, Pyspark
Hands on experience in working with Continuous Integration and Deployment (CI/CD).
Defining user stories and driving the agile board in JIRA during project execution, participate in sprint demo and retrospective.
Experience in using various version control systems like Git.
Strong analytical skills with the ability to collect, organize and analyze large amounts of information with attention to detail and accuracy.
Possess good interpersonal, analytical presentation Skills, ability to work in Self-managed and Team environments.

TECHNICAL SKILLS

Programming Languages: Python, SQL, PHP, C++, Shell Scripting

SDLC Methodologies: Agile/SCRUM, Waterfall

Operating Systems: Windows, Linux, Unix

Python Libraries: Pyspark, Pandas, Beautiful Soup, Jinja2, NumPy, SciPy, Matplotlib, Unit test

Big Data Tools: Hadoop3.3, Hive3.2.1, Kafka2.8, Scala, MapReduce, Sqoop

Cloud Tools: Azure and AWS (S3, RDS, Dynamo, EMR, Redshift, Glue)

Databases: MS SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, SQLite, Dynamo DB

Version Controls: Git, GitHub, Bitbucket

Other Tools: Snowflake, Databricks, Hive Spark, Matillion, ETL, JIRA, Docker, MS Excel, Data Pipeline, Data Modeling

PROFESSIONAL EXPERIENCE

Confidential

Sr Data Engineer

Responsibilities:

Working as a Data Engineer, Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.
Highly Involved into Data Architecture and Application Design using Cloud and Big Data solutions on AWS.
Work independently to develop and test Hive based blocks to load and register data into Data Lake and Data Warehouse.
Migrated application from internal data center toAWS cloud platform.
Used and developed python scripts to migrate data to AWS.
Involved in Dataset Registration in Exchange for S3, One Lake, and Snowflake.
Created ETL jobs using matillion to load server data into the Snowflake Data Warehouse.
Worked onData management and Data Integration.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce and Hive.
Provided guidance to development team working on PySpark as ETL platform.
Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
Maintain existing ETL workflows, data management and data query components.
Extensively worked usingAWSservices along with wide and in depth understanding of each one of them
Involved in creating Hive tables and loading and analyzing data using hive queries.
Built REST APIs to easily add new analytics or issuers into the model.
Developed Map Reduce program for parsing and loading into HDFS.
Used Hive to do transformations, event joins and some pre-aggregations before store and register data into snowflake.
UsedHive SQLto do analysis on the data and identify different correlations.
Used Hivescriptand Amazon Elastic MapReduce for unit testing.
Imported data from Dynamo DB to Redshift in Batches using Amazon Batch using TWS scheduler.
Performed Data analysis and Data validation on Snowflake and Hue Editor.
Capacity planned for cloud infrastructure with AWS console, EC2 Instances, AWS EMR, S3
Used Databricks as platform for high-scaled analytics using spark.
Worked on the Analytics Infrastructure team to develop a stream filtering system on top of Apache Kafka.
Used Python scripts to update content in the database and manipulate files.
Involved in supporting a cloud-based data warehouse environment. such as Snowflake
Implemented code in python to retrieve and manipulate data.
Worked with data governance team to maintain data models, Metadata, Data Dictionaries define source fields and its definitions.
Extensively involved in managing and reviewing Hadoop log files.
Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
Design and build ETL pipelines to automate ingestion of structured and unstructured data.
Demonstrated a full understanding of the Fact/Dimension data warehouse design model, including star and snowflake design methods.
Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
Responsible for the design, implementation, and architecture ofvery large-scale data intelligence solutions around Snowflake Data Warehouse.
Worked on functions in Lambda that aggregates the data from incoming events, and then stored result data in Amazon DynamoDB.
Load and transformedlarge sets of structured, semi structured and unstructured data usingSpark and Databricks.
Used Amazon EMR for map reduction jobs and test locally using Jenkins.
Created Load Balancer on AWS EC2 for stable cluster and services which provide fast and effective processing of Data.
Developed API for using AWS Lambda to manage the servers and run the code in the AWS .
Designed and implemented RestApi to access snowflake DB platform.
Created different type of reports including Cross-tab, Conditional, and Drill-down, OLAP and Sub reports.
Worked on Oozie workflow engine for job scheduling.
Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.
Wrote many T-SQL Stored procedures to perform specific tasks as per the requirements.
Presented data in a visually appealing tool Tableau.

Environment: Agile, Snowflake, Hadoop3.3, Python3.1, AWS, Matillion, Jenkins2.3, DynamoDB, HDFS, Hive3.2.1, Lambda, EC2, EMR, S3, OLAP, Rest API, T-SQL, Oozie, MapReduce, XML, Kafka2.8, Tableau.

Confidential, Miami FL

Sr. Data Engineer

Responsibilities:

Worked independently in the development, testing, implementation, and maintenance of systems of moderate-to-large size and complexity.
Used the Agile methodology to build the different phases of Software development life cycle (SDLC).
Extensively worked usingAWSservices along with wide and in depth understanding of each one of them.
Handled Business logics by backend Python programming to achieve optimal results.
Responsible for ETL (Extract, Transform, and Load) processes to bring data from multiple sources into a single warehouse environment.
Worked on standard python packages like boto and boto3 for AWS.
Investigated data sources to identify new data elements needed for data integration.
Worked with delivery of Data & Analytics applications involving structured and un-structured data on Hadoop based platforms.
Carried out various mathematical operations for calculation purpose using python libraries.
Involved in designing and deploying AWS Solutions using EC2, S3, RDS and Redshift.
Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data through Hadoop, MapReduce, Pig and Hive.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed
Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
Proficiency in wrangling data and creating data pipelines using fast, efficient Python code.
Used GIT for the version control and deployed project into AWS.
Configured Inbound/Outbound in AWS Security groups according to the requirements.
Used AWS Glue for the data transformation, validate and data cleansing.
Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
Designed and developed data management system using MySQL.
Worked with Analysts on various requirements gathered in JIRA.
Used EMR for data pre-analysis by creating EC2 instances.
Designed and documented Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object Oriented Design) using Visio.
Created complex program unit using PL/SQL Records, Collection types
Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
Effectively loaded real time data from various data sources into HDFS using Kafka.
Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
Created severalDatabricks Sparkjobs withPysparkto perform several tables to table operations.
Developed AWS Lambda python functions using S3 triggers to automate workflows.
Prepared Dashboards using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau.

Environment: Hadoop3.0, Agile, AWS, Python, MS Visio, JIRA, MySQL, HDFS, Kafka1.1, GIT, EC2, S3, Spark, OLTP, ODS, MongoDB, Tableau.

Confidential - New York, NY

Data Engineer

Responsibilities:

Worked as a Data Engineer worked with the analysis teams and management teams and supported them based on their requirements.
Worked on Agile, testing methodologies, resource management and scheduling of tasks.
Worked on all phases of data warehouse development lifecycle, from gathering requirements to testing, implementation, and support.
Worked with Amazon Web Services (AWS) for improved efficiency of storage and fast access.
Worked on Data migration project from Teradata to Snowflake.
Developed data pipeline using Sqoop to ingest cargo data and customer histories into HDFS for analysis.
Worked on AWS Redshift and RDS for implementing models and data on RDS and Redshift.
Implemented Data pipelines for big data processing using Spark transformations and Python API and clusters in AWS.
Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS
Involved in developing ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Added support for Amazon AWS S3 to host static/media files and the database into Amazon Cloud.
Used Oozie to automate data loading into the HDFS and PIG to pre-process the data.
Called Data Rules and Transform Rules functions using Informatica Stored Procedure Transformation.
Implemented data models, database designs, data access, table maintenance and code changes together with our development team.
Joined various tables in Cassandra usingspark and Scalaand ran analytics on top of them.
Analyze data from multiple data sources and develop a process to integrate the data into a single but consistent view
Performed Data Analytics on Data Lake using Pyspark on Databricks platform.
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Day to-day responsibility includes developing ETL Pipelines in and out of data warehouse, develop reports using advanced SQL queries in snowflake.
Involved in BI interactive Dashboards and Tableau Publisher.
Created multiple dashboards in tableau for multiple business needs.
Used Excel's VLOOKUP's to determine the customer data and created pivots to easily access and validate data.

Environment: Agile, Snowflake, AWS, Spark, HDFS, Sqoop, Oozie, Scala, Cassandra, PIG, HBase, Tableau, Excel, Pyspark, Databricks, Informatica.

Confidential - SanJose, CA

Data Analyst/Engineer

Responsibilities:

Worked as a Data Analyst/Engineerto generateDataModels using Erwin and developed relational database system.
Analyzed data where it lives by Mounting Azure Data Lake and Blob to Databricks.
Involved in extracting and mining data for analysis to aid in solving business problems.
Used Azure Data Lake as Source and pulled data using Azure Polybase.
Formulated SQL queries, Aggregate Functions, and database schema to automate information retrieval.
Involved in manipulating data to fulfill on analytical and segmentation requests.
Managed data privacy and security in Power BI.
Written complex SQL queries for data analysis to meet business requirements.
Using Data Visualization tools and techniques to best share data with business partners
Designed and implemented aDataLake to consolidatedatafrom multiple sources, using Hadoop stacks technologies like Sqoop, hive.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Reviewed code and system interfaces and extracts to handle the migration of data between systems/databases.
Developed a Conceptual model using Erwin based on requirements analysis.
Involved inETLmappingdocuments indatawarehouseprojects.
Involved in loading of data into Teradata from legacy systems and flat files using complex Multi Load scripts and Fast Load scripts.
Created azure data factory (ADF pipelines) using Azure polybase and Azure blob.
Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
Developed T-SQL code, stored procedures, views, functions, and other database objects to supply data for downstream applications and fulfill business requirements.
Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
Implemented data ingestion and handling clusters in real time processing using Kafka.
Created/ Tuned PL/SQL Procedures, SQL queries for Data Validation for ETL Process.

Environment: Azure Data Lake, Erwin, Power BI, Hadoop, HBase, Teradata, T-SQL, SSIS, PL/SQL.

Confidential

Data Analyst

Responsibilities:

Performed Data Analysis using SQL queries on source systems to identify data discrepancies and determine data quality.
Performed extensive Data Validation, Data Verification against Data Warehouse and performed debugging of the SQL-Statements and stored procedures for business scenarios.
Designed and developed Tableau dashboards using stack bars, bar graphs, scattered plots, and Gantt charts.
Familiar with DBMS table design, loading, Data Modeling, and experience in SQL.
Worked on ER/Studio for Conceptual, logical, physical data modeling and for generation of DDL scripts
Handled performance requirements for databases inOLTPandOLAP models.
Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers
Performed data completeness, correctness, data transformation and data quality testing using SQL.
Involved in designing Business Objects universes and creating reports.
Conducted design walk through sessions with Business Intelligence team to ensure that reporting requirements are met for the business.
Prepared complex T-SQL queries, views and stored procedures to loaddatainto staging area.
Wrote UNIX shell scripts to invoke all the stored procedures, parse the data and load into flat files.
Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.

Environment: SQL, PL/SQL, OLAP, OLTP, UNIX, MS Excel, T-SQL.

We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

Miami, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship