Data Engineer Resume
4.00/5 (Submit Your Rating)
Jacksonville, FloridA
SUMMARY
- Over 7 plus years of professional experience in Analysis, Design, and Development of Enterprise grade Software Applications.
- 5 plus Years of experience in Big Data technologies.
- Over 3 plus years hands - on experience on Apache Spark, AWS EMR, and S3.
- Proven ability to excel in fast-paced development environment using latest frameworks/tools Eclipse, Maven, JFrog, Git, Bitbucket, and Jenkins, OpenShift, Docker, MongoDB.
- Have been well appreciated for the previous task accomplishments
- Have designed and implemented data applications solutions on Hadoop ecosystem.
- Good exposure to Software Development Life Cycle and other Project Management activities.
- Have worked on AWS EMR, Cloudera, and Map-R Hadoop distributions.
- Worked in finance, Healthcare and Telecommunication domains.
- Experience on working Microservices and Spring boot application, Data Governance, Data Lineage
- Have worked in Agile (Scrum) methodologies.
- Quick learner and self-starter to go ahead with any new technology
- Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
- Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters In Databricks, Managing the Machine Learning Lifecycle
PROFESSIONAL EXPERIENCE
Confidential, Jacksonville Florida
Data Engineer
Responsibilities:
- Involved in gathering business requirements, analyzing the project, and creating Use Cases and Design Document for new requirement.
- Converting long-running Hive queries to Spark Scala to improve performance.
- Working in Agile environment, providing support on implemented projects
- Managing Microservices and Spring boot applications, involved in build, test and deploying applications for data governance and data lineage system.
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
- Worked with the Data Lake team to ingest data from various sources into a data lake using tools like Teradata, Hive, Spark, and make it available for different ML applications using Hortonworks Data Platform.
- Wrote spark jobs to generate standard and customize analytic reports.
- Working in Agile environment, providing support on implemented projects.
- Experience in using stackdriver service/dataproc clusters in GCP for accessing logs for debugging.
- Experience in GCP Dataproc, GCS Cloud functions, BigQuery, Azure Data Factory.
- Worked with DevOps team to automate EMR cluster creation and destruction.
- Develop an ingestion and processing platform for terabytes of data using a variety of tools such as Spark, Spark Streaming, Kafka, Cassandra.
- Using DevOps tools like Jenkins, JFrog for automation, deployment, and management CI/CD pipelines
- Designing architecture and building new features into the existing product.
- Deployment architecture, performance, load balancing, and continuity of business design
- Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.
Confidential, New Jersey
Data engineer
Responsibilities:
- Proficient in SQL and have experience in developing Spark programs using Scala.
- Operating and scaling our infrastructure in AWS.
- Data Source: Map-R DB, loaded data into Spark Data Frames and used Spark-SQL and native core Scala to explore data insights.
- Extensive use of cloud shell SDK in GCP to configure/deploy the services Data Proc, Storage, and BigQuery.
- Implemented Spark using Scala and utilizing Spark Core, Spark Streaming Spark SQL API for faster processing of data instead of using MapReduce in Java.
- Configured Spark streaming to get ongoing information from the Map-R Stream and store the stream information to HDFS
- Operating and scaling our infrastructure in AWS.
- Involved in Design and development of Hive DDLs.
- Designed data pipelines to extract the data from disparate sources.
- Working on enhancing the performance of the data pipeline in terms of speed as well are improving the quality of data to achieve 100% referential integrity.
- Developed multi cloud strategies in better using GCP (for its PAAS) and Azure (for its SAAS)
- Using Spark-SQL to Load JSON data and create Schema RDD and loaded into Hive Tables and handled structured data using Spark-SQL
- Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
- Designed data pipelines to extract the data from disparate sources.
- Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Worked on setting up batch intervals, split intervals, and window intervals in Spark Streaming.
- Worked on implementing data quality checks using Spark Streaming and arranged passable and bad flags on the data.
- Good understanding of using Control-M graphical user interface and command line interface for defining, scheduling, and monitoring jobs.
- Got involved in migrating on prem Hadoop system to using GCP (Google Cloud Platform).
- Worked on various RDBMS platforms like Oracle, MySQL
- Troubleshoot and review data backups and Hadoop log files.
Confidential, New York
Data engineer
Responsibilities:
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Extensive knowledge of application using Microsoft Azure as the cloud technology.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster data processing.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Loaded data into Spark Data Frames and used Spark-SQL to explore data insights.
- Implemented Azure Application insights to store user activities and error logging.
- Worked on Hadoop distribution deployed on to AWS EMR Instances.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud Services like EC2, S3 and RDS.
- Pulled the data from MySQL Server, Amazon S3 bucket & internal SFTP and loaded it in S3 bucket.
- Worked on Unix shell scripts for pattern matching in log files to format warnings and errors.
- Created AZURE Function apps, Logic apps and power app to reduce omplexity of code.
- Converted .Net application to Microsoft Azure Cloud Service Project as part of cloud deployment.
- Created the Spark Job to process the data and loaded it in Amazon Redshift.
- Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark databricks cluster
Confidential
Big Data Engineer
Responsibilities:
- Provided L2 production support, monitored jobs using AutoSys for data warehousing project. Co-ordinated with L1 and L3 support team and resolved the incidents on timely manner.
- Involved in Data Analysis, Data Mapping, Data Modeling and Performed Data integrity.
- Written stored procedure for those reports which use multiple data sources.
- Worked on managing MongoDB environment from availability, performance, and scalability perspectives.
- Added indexes to improve performance on tables.
- Involved in requirements gathering, Analysis, Effort estimation, Low-level design, and peer review.
- Developed complex queries in HIVE to validate data migration of historical and incremental data between source and target database.
- Designed and implemented HIVE queries and functions for evaluation, filtering, loading, and storing of data.
- Worked on the Ad hoc queries, Indexing, Replication, Load balancing, and Aggregation in MongoDB.
- Used HIVE to do transformations, joins and some pre-aggregations before storing the data onto HDFS.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using shell scripting.
- Involved in performance tuning of the ETL process by addressing various performance issues at the extraction and transformation stages.
Confidential
Network Support Analyst
Responsibilities:
- Analyzed more than 3000 cases of KPIs degradation. Identified root cause using data of counters and event/alarm logs.
- Managed and analyzed business problems to identify efficient cost-effective technical solutions.
- Real Time Analysis of statistical data reports of cellular network of 3G/2G.
- Involved in Data Analysis, Data Mapping and Data Modeling and Performed Data integrity, Data Portability testing by executing SQL statements for the customer Data
- Resolving customer complaints with defined SLA- 24 hours.
- Used SOAP for the data exchange and load testing for application and performed System Testing
- Analyzed patterns of traffic and KPIs for identification of degradation from data logs.
- Network surveillance by participating in a 24/7 shift cycle for network alarm monitoring.