Senior Data Engineer Resume

SUMMARY

Approximately 6+ years of professional experience in Big Data, Data Engineering with skills in Python, Hadoop, Spark, and other related technologies.
Compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring, and cloud deployment manager are some of the Google Cloud Platform (GCP)
Worked with Azure services like HDInsight, Stream Analytics, Active Directory, Blob Storage, Cosmos DB, Azure Data Lake, Azure Storage, Azure SQL, Azure DW and Azure Data Bricks.
Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage.
Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS), implementation of various AWS services like EMR, Redshift.
Development, Implementation, Deployment, and Maintenance of analytics technologies such as HDFS, HDF5 MapReduce, Spark, Scala, Yarn, Kafka, PIG, HIVE, Sqoop, Flume, Oozie, Impala, and HBase. Airflow, Zookeeper, Ambari, Nifi, AWS,Azure, Google Cloud Platform.
Developed environments of different applications on AWS by provisioning on EC2 instances using Docker, Bash and Terraform.
Worked on Spark SQL, loading data from Hive tables into data frames, preparing data for storage in AWS S3, and interacting with the SQL interface via command line or JDBC.
Proficient in relational databases like Oracle, MySQL, and SQL Server.
Consulting on Snowflake Data Platform Solution Architecture, Design, Development, and deployment focused to bring the data driven culture across the enterprises.
Created an internal tool to compare RDBMS and Hadoop, ensuring that all data in the source and target matches using a shell script, reducing the complexity of data movement.
Acquired profound knowledge in developing production ready Spark applications utilizing Spark Core, Spark Streaming, Spark SQL, Data Frames, Datasets and Spark - ML.
Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing, and developing MongoDB clusters. Implemented scripts for import and export.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery, coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database Systems and from Relational Database Systems to HDFS.
Extensive experience in development of Bash scripting and PL/SQL scripts.
Knowledge in using Integrated Development environments like VS code, Eclipse, NetBeans, IntelliJ, STS, MATLAB, RStudio, Google Collab, Navicat, Jupyter, Notepad++, PyCharm.
Visualized and explored real-time high-density physiological data collected from Neuro-ICU.
UsedPandas, NumPy, Seaborn, SciPy, Matplot lib, Scikit-learn, NLTK, Pandas, PeakUtil, Dask, TensorFlow, PyTorch, Keras inPythonfor developing algorithms in machine learning and utilized some of ML algorithms such as linear regression, multivariate regression, and naiveBayes.
DevOps tools such as GitHub, Jenkins, JIRA, Docker, and Slack were used to migrate legacy applications to the cloud platform, Git was used to handle source code and version control.
Ability to collaborate with managers and executives to grasp business objectives and deliver as needed, as well as a strong believer in cooperation, ability to solve problems both alone and collaboratively.
Experience with Tableau, Power BI, Arcadia, and Matplotlib for creating interactive dashboards, reports, and ad-hoc analysis and visualizations.
Leveraged the services of AWS -KMS and chef Encrypted data bags for proper Encryption and Security of Credentials like DB passwords.
Capable in working with Agile, Spiral & Waterfall Methodologies.
Excellent communication, interpersonal, and problem-solving abilities, as well as the ability to work as part of a team. Adaptability to new environments and technology in a timely manner.
Perform data profiling; identify/communicate data quality issues; and, if needed, collaborate with other teams to remedy them. Extremely well-organized, with a track record of completing multiple activities and assignments in the given time.

TECHNICAL SKILLS

Big Data: HDFS, HDF5, NetCDF, MapReduce, Hive, Pig, Sqoop, Flume, HBase, Kafka, Impala, Stream sets, Oozie, Spark, Zookeeper, Ni-Fi, Airflow, Airbyte.

Operating systems: Linux, Mac OS, Windows

Database Management: MySQL, Oracle, DB2, PostgreSQL, DynamoDB, SQL SERVER, Teradata, Snowflake.

Programming Languages: MATLAB, Python, R, Java, R, Pig Latin, HiveQL, Shell Scripting, Scala.

Web Development: JavaScript, Node.js, HTML, CSS, Spring, J2EE, JDBC, Angular, Hibernate, Tomcat.

BI: PowerBI, Tableau, Talend, Informatica, Dax, SSIS, Panoply (NASA), SSRS, SSAS, QlikView, Qlik Sense.

Software Methodologies: Agile, SDLC

Version control: GIT, SVN, Bitbucket.

Design Patterns: Eclipse, Net Beans, PySpark, IntelliJ, Spring Tool Suite, Jenkin’s, Kubernetes, Docker, REST API.

NoSQL: MongoDB, Cassandra, HBase

IDE Dev. Tools: PyCharm, Vi / Vim, Sublime Text, Visual Studio Code, Jupyter Notebook, Google Collab.

Google Cloud Platform: BIG query, Cloud Data Proc, Google Cloud Storage, Composer, Pub/Sub, Data Flow, Cloud SQL.

Azure: HDInsight, Stream Analytics, Active Directory, Blob Storage, Cosmos DB, Azure Data Lake, Azure Storage, Azure SQL, Azure DW and Azure Data Bricks.

PROFESSIONAL EXPERIENCE

Senior Data Engineer

Confidential

Responsibilities:

Have Extensive Experience in IT data analytics projects, hands on experience in migrating on premise ETs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer, Pub/Sub, Data Flow, Cloud SQL.
Migrating an entire oracle database to Big Query and using of power bi for reporting, build data pipelines in airflow in GCP for ET related jobs using different airflow operators.
Experience in moving data between GCP and Azure using Azure Data Factory also building power bi reports on Azure Analysis services for better performance.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery, coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery, can work parallelly in both GCP and Azure Clouds coherently.
Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Dataproc over large Datasets, Created BigQuery authorized views for row level security or exposing the data to team, hands on experience with programming languages such as Python, SAS.
Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering, and skewing, Migrated an Oracle SQL ETL to run on google cloud platform using cloud dataproc & bigquery, cloud pub/sub for triggering the airflow jobs.
Work related to downloading big Query data into pandas or Spark data frames for advanced EL capabilities, built reports for monitoring data loads into GCP and drive reliability at the site level.
Worked with google data catalog and other google cloud APl's for monitoring, query, and billing related analysis for BigQuery usage, very keen in knowing newer techno stack that (GCP) adds.
Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process, examine and evaluate reporting requirements for various business units.
Design new applications for high transaction processing & scalability to seamlessly support future modifications and growing volume of data processed in environment.
Implement solutions to run effectively in cloud and improve the performance of big data processing and high volume of data being handled by the system to provide better customer support.
Used Apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators.
Work with business process managers and be a subject matter expert for transforming vast amounts of data and creating business intelligence reports using the state-of-the-art big data technologies
Good knowledge in using cloud shell for various tasks and deploying services.
Created BigQuery authorized views for row level security or exposing the data to other teams, hands on experience with different programming languages such as Python, SAS.
Diverse experience in all phases of software development life cycle (SDLC) especially in Analysis, design, development, Testing and Deploying of applications, used Tableau for data visualization.

Environment: Database Management: SQL, Cloud SQL, Mongo DB. Google Cloud Platform: GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub. Reporting: Power BI, Data Studio Matplotlib

Big Data Engineer

Confidential

Responsibilities:

Actively participated in daily scrum meetings with cross teams in all phases of SDLC.
Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
Ingesting data from ORACLE RDBMS, Tera data systems into Hadoop data lake.
Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL, and Spark Streaming.
Used databricks for structured data where we can perform spark DataFrames on databricks tables.
Experienced in handling distinct types of joins in spark such as cross join and broadcast join.
Created significant improvement in the execution times to Cluster Migration from Hortonworks distribution to Cloudera.
Managing and monitoring the Hadoop cluster through Cloudera Manager.
Experience with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD’s, Spark YARN.
Load D-Stream data into Spark RDD and do in memory data Computation to generate output response.
Developed Spark code by using Scala and Spark-SQL for faster processing and performing data transformations.
Involved actively in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization during the project.
Actively participated in daily scrum meetings with cross teams in all phases of SDLC.
Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop. tableau
Ingesting data from ORACLE RDBMS, Tera data systems into Hadoop data lake.
Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL, and Spark Streaming.

Environment: Hadoop, Scala Spark, Spark-SQL, Spark Sqoop, HBase/MapR DB, Apache Drill, Hive, Map Reduce, HDFS, Sqoop, Maven, Jenkins, Java (JDK 1.7), Java 7, Eclipse, Oracle 10g, PL/SQL, Linux, Tidal

Full Stack Developer

Confidential

Responsibilities:

Worked on a React Js based project based on CRM (Customer Relationship Management).
Studied and Analyzed Zoho Analytics (SaaS Analytics Portal), RDBMS and React Js.
Development Life Cycle (SDLC) and used agile methodology for developing application.
Designed the front end of the application using React Js, HTML, CSS, JSON, and jQuery. Worked on backend of the application.
Involved in analysis and design of the application features.
Created UI using JavaScript and HTML/CSS.
Writing backend programming in Python.
Used JavaScript and XML to update a portion of a webpage.
Worked on changes to open stack accommodate large-scale data center deployment.
Worked in MySQL database on simple queries and writing Stored Procedures for normalization.
Responsible for handling the integration of database system.
Developed and Deployed SOAP based Web Services on Tomcat Server.
Used object-relational mapping (ORM) solution, technique of mapping data representation from MVC model with an SQL-based scheme.
Used IDE tool to develop the application and JIRA for bug and issue tracking.
Used GIT to coordinate team development.

Environment: Python 2.7, MySQL, React Js, HTML, CSS, JavaScript, jQuery, Sublime text, JIRA, GIT

Data Analyst

Confidential

Responsibilities:

Compile all the customer's available information. The information gathered to anticipate churn is divided into the following areas. Demographic data, location of clients, purchased services, % Space used in cloud, Daily transaction rate & so on.
To estimate customer acquisition costs, we used Machine Learning (Logistic Regression and PCA). Performing software analysis
Validation of data items through exploratory data analysis (univariate, bivariate, multivariate analysis).
Building data analysis infrastructure to collect, analyze, and visualize data was my responsibility.
The R-square and VIF values were used to choose the variables.
Application testing and debugging
Making suggestions for enhancements to application procedures and infrastructure.
Collaborating with cross-functional teams

Environment: Python, MS SQL SERVER, T-SQL, SSIS, SSRS, SQL Server Management Studio, Oracle, Excel, Tableau, Informatica.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship