Data Engineer Resume

SUMMARY

7 years of experience in design and deployment of Enterprise Application Development and Web Applications which includes 6 plus years of comprehensive experience in big data ecosystem.
Complete Understanding on Hadoop daemons such as Job Tracker, Task Tracker, Name Node and Data Node that provides client communication, job execution and management, resource scheduling and resource management.
Expertise on Hadoop architecture and ecosystem such as HDFS, Sqoop, Spark, Ni - Fi, Pig and Oozie.
Good knowledge in MapReduce, that can generate big data sets with a parallel and distributed algorithm.
Acquired skills using Hive that provides data query and analysis, Sqoop that helps transferring data between relational databases and Hadoop.
Expertise in ingesting data from external sources to HDFS for data processing using Flume.
Hands on experience in creating workflows and scheduling jobs using Oozie.
Expertise in open-source server that provides a distributed configuration service, synchronization service and naming registry using Zookeeper.
Experience in installation, configuration, management, supporting and monitoring Hadoop cluster using various distributions such as Cloudera, Hortonworks, and various cloud services like AWS, GCP.
Gathered knowledge in NIFI that can automate the flow of data and learnt how to deliver data to every part of your business with smart data pipelines using Stream Set.
Expertise in writing custom Kafka consumer code and modifying existing producer code in Python to push data to Spark-streaming jobs.
Experience with NumPy, a library that supports multi-dimensional arrays and matrices, Matplotlib, a plotting library, Pandas that is used for data manipulation and analysis with less code, and PySpark for writing spark applications using python API’s.
Ample knowledge on Apache Kafka, Apache Storm to build data platforms, pipelines, and storage systems; and search technologies such as Elastic search.
Worked extensively over semi-structured data (fixed length & delimited files) for data sanitation, report generation and standardization.
Extensive experience working with AWS Cloud services and AWS SDKs to work with services like AWS API Gateway, Lambda, S3, IAM and EC2.
Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming.
Good at implementing Kafka custom encoders for custom input format to load data in partitions.
Experienced in providing highly available and fault tolerant applications utilizing orchestration technology on Google Cloud Platform (GCP).
Experienced on the planning and capacity requirements for the migration path of Confidential BigInsights (on-prem) solution to cloud native GCP based solution. This involved tools like DataProc, DataFlow, Cloud Functions, Google Cloud Storage and Pub/Sub.
Knowledge in automated deployments leveraging Azure Resource Manager Templates, DevOps and Git repository for Automation and usage of Continuous Integration (CI/CD).
Experienced in User Defined Functions (UDF) such as Spark, HiveQL, and SQL for data processing and analysis.
Developed expertise in database management system built to handle large amounts of data using Cassandra that is highly scalable.
Experienced with Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud storage, cloud SQL, stack driver monitoring, and cloud deployment manager.
Gained knowledge and expertise in PostgreSQL that is an open-source object-relational database system, used as the primary data store or data warehouse for many web, mobile, geospatial, and analytics applications.
Worked with Impala to access/analyze data that is stored in Hadoop data nodes without data movement and provides fast access for the data in HDFS and moved data from the datacenter to the cloud and the edge by using Couchbase.
Working experience on NoSQL databases like HBase for fault-tolerant way of storing sparse data sets, with functionality and implementation and used built-in functions which are extensions to SQL which are provided by Tera Data.
Extensive experience across both relational databases and non-relational databases such as Oracle for effectively managing data with high performance, authorized access and failure recovery features and worked with PL/SQL, SQL Server, MySQL, and DB2.
Gained experience in DATA BRICKS which was a simple collaborative environment to run interactive and scheduled data analysis.
Gained expertise in ELK STACK can provide faster troubleshooting, security analytics and SPLUNK, that can make information searchable, generate alerts, reports, and visualizations.
Extensive skills in Tableau Software that helps make Big Data small, and small data insightful and actionable.

TECHNICAL SKILLS

PROGRAMMING LANGUAGES: Java, Scala, Python and Shell Scripting, C#

BIG DATA ECOSYSTEM: Spark, Hive, HBase, SQOOP, Oozie, Storm, Flume, Pig, Kafka, NIFI, Zookeeper, MapReduce

CLOUD: AWS EMR, EC2, S3, RDS, Azure Databricks, Azure Data Factory, GCP

DBMS: SQL Server, MySQL, PL/SQL, Oracle, Cassandra, Vertica, Versant

WEB TECHNOLOGIES: HTML, JavaScript, XML, JQuery, Ajax, CSS

WEB SERVICES: Web Logic, Web Sphere

IDEs: Eclipse, IntelliJ, Visual Studio, WinSCP

DevOps: GitHub, Jenkins, Ansible, Chef, Docker, Nagios, Puppet

OPERATING SYSTEMS: Windows, Unix, Linux, Solaris, CentOS

FRAMEWORKS: MVC, Struts, Maven, Junit, Log4J, ANT, Tableau, Splunk, Aqua-data Studio

J2EE TECHNOLOGIES: Spring, Servlets, J2SE, JSP, JDBC

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential

Responsibilities:

Used Spark Scala to import customer information data from Oracle database into HDFS for data processing along with minor cleansing.
Developed MapReduce jobs to calculate the total usage of data by commercial routers in different locations using Horton work distribution.
Involved in information gathering for new enhancements in Spark Scala, Production support for field issues and label installs for Hive scripts and MapReduce jobs.
Developed spark applications in python on distributed environment to load massive number of CSV files with different schema in to Hive tables.
Used Maven to build rpms from source code in Scala checked out from GIT repository, with Jenkins being the Continuous Integration Server and Artifactory as repository manager.
Responsible for Setting up UNIX/Linux environments for various applications using shell scripting.
Stored data in AWS S3 like HDFS and performed EMR programs on data stored in S3.
Used AWS glue ETL service that consumes raw data from S3 bucket and transforms raw data as per the requirement and write the output to s3 bucket in parquet format for data analytics purpose.
Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3buckets, performed folder management in each bucket, managed logs, and objects within each bucket.
Worked in cloud formation to automate AWS environment creation along with the ability to deploy AWS using bill scripts (Boto3 and AWS CLI) air.
Set up scalability for application servers using command line interface for Setting up and administering DNS system in AWS using Route53.
Write Python scripts to update content in the database and manipulate files. Involved in building database Model, APIs, and Views utilizing Python technologies to build applications.
Visualize and manipulate the data using various machine learning libraries like NumPy, SciPy and Pandas in Python scripts for the perfect analysis of data.
Translated customer business requirements into technical design documents, established specific solutions, and leading the efforts including programming in Spark Scala and testing that culminate in client acceptance of the results.
Expertise in Object-Oriented Design (OOD) and end-to-end software development experience working on Scala coding and implementing mathematical models in Spark Analytics.
Created Hive external tables on top of datasets loaded in AWS S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis.
Used AWS Lambda to perform data validation, filtering, sorting, and other transformations for every data change in a HBase table and load the transformed data to RDS.
Loading data from different servers to S3 bucket and setting appropriate bucket permissions.
Configured routing to send JMS files to interact with application for real time data using Kafka.
Managed Zookeeper for cluster co-ordination and Kafka Offset monitoring.
Optimized legacy queries to extract the customer information from Oracle.
Reviewed HDFS usage and system design for future scalability and fault tolerance.
Strong Experience in implementing Data warehouse solutions in Confidential Redshift.
Worked on various projects to migrate data from on premise databases to Confidential Redshift and RDS

Environment: HDFS, Spark, Scala, Python, Shell Scripting, Hive, HBase, Oracle, MapReduce, Logstash, Jenkins, Versant,Java, Kafka, Horton works, GIT, ClearCase, Zookeeper, Ansible, AWS.

Big Data Engineer

Confidential

Responsibilities:

Worked as a Hadoop developer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
Involved in Agile development methodology active member in scrum meetings.
Worked in Azure environment for development and deployment of Custom Hadoop Applications.
Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark, and Shells scripts.
Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, data Lake and Data Factory.
Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
Manage and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloud era & Horton works HDP.
Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
Installed Hadoop, Map Reduce, HDFS, Azure to develop multiple Map Reduce jobs in PIG and Hive for data cleansing and pre-processing.
Migrating an entire oracle database to Big Query and using of power BI for reporting.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Experience in moving data between GCP and Azure using Azure Data Factory.
Used Cloud shell SDK in GCP to configure the services Data Proc, Storage,, Big Query.
Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Developed a Spark job in Java which indexes data into Elastic Search from external Hive tables which are in HDFS.
Performed transformations, cleaning and filtering on imported data using Hive, MapReduce, and loaded final data into HDFS.
Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Import the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
Performed transformations like event joins, filter boot traffic and some pre-aggregations using Pig.
Used windows Azure SQL reporting services to create reports with tables, charts, and maps.
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop 3.0, Azure, Sqoop 1.4.6, PIG 0.17, Hive 2.3, MapReduce, Spark 2.2.1, Shells scripts, SQL, Hortonworks, Python, ML Lib, HDFS, YARN, Java, Kafka 1.0, Cassandra 3.11, Oozie, Agile

Java/J2EE Developer

Confidential

Responsibilities:

Involved in the software development life cycle coding, testing, and implementation.
Worked in the health-care domain.
Involved in Using Java Message Service (JMS) for loosely coupled reliable and asynchronous exchange of patient treatment information among J2EE components and legacy system
Developed MDBs using JMS to exchange messages between different applications using MQ Series.
Involved in working with J2EE Design patterns (Singleton, Factory, DAO, and Business Delegate) and Model View Controller Architecture with JSF and Spring DI.
Involved in Content Management using XML.
Involved in analysis and design phase of Software Development Life cycle (SDLC).
Used JMS to pass messages as payload to track statuses, milestones, and states in the workflows.
Involved in reading & generating pdf documents using ITEXT. And merge the pdfs dynamically.
Developed a standalone module transforming XML 837 module to database using SAX parser.
Installed, Configured, and administered WebSphere ESB v6.x
Worked on Performance tuning of WebSphere ESB in different environments on different platforms.
Configured and Implemented web services specifications in collaboration with offshore team.
Involved in Creating dashboard charts (business charts) using fusion charts.
Involved in creating reports for most of the business criteria.
Involved in the configurations set for Web logic servers, DSs, JMS queues and the deployment.
Involved in creating queues, MDB, Worker to accommodate the messaging to track the workflows
Created Hibernate mapping files, sessions, transactions, Query and Criteria to fetch the data from DB.
Developed ANT scripts to build and deploy projects onto the application server.
Involved in implementation of continuous build tool as Cruise control using Ant
Enhanced the design of an application by utilizing SOA.
Generating Unit Test cases with the help of internal tools.
Used JNDI for connection pooling.

Environment: JAVA/J2EE, HTML, JS, AJAX, Servlets, JSP, XML, XSLT, XPATH, XQuery, WSDL, SOAP, REST, JAX-RS, JERSEY, JAX-WS, Web Logic server 10.3.3, JMS, ITEXT, Eclipse, JUNIT, Star Team, JNDI, Spring framework - DI, AOP, Batch, Hibernate.

Java Developer

Confidential

Responsibilities:

Professional experience in development and deployment of various Object oriented and web - based Enterprise Applications using Java/J2EE technologies and working on the complete System Development Life Cycle (SDLC).
Designed and developed the UI of the website using HTML, Spring Boot, React JS, CSS, and JavaScript.
Proficient in Core Java, J2EE, JDBC, Servlets, JSP, Exception Handling, Multithreading, EJB, XML, HTML5, CSS3, JavaScript, AngularJS.
Experience in building fast and scalable network applications using Node Js and for quick startup of a production-grade, stand-alone application used Spring Boot.
Utilized Spring Boot and java as backend and React JS as frontend and MYSQL as database.
Designed and developed data management system using MySQL. Built application using Spring JPA for database persistence.
Expertise in application/web servers like Confidential Web Sphere, Web Logic Application Servers, JBoss and Tomcat Web Servers.
In the backend, worked on persisting the data shown on the screen after uploading an excel sheet to the database for a particular user.
Created a dashboard for managers to compare Allocation details based on their monthly time.
Worked on uploading the feature which would allow the user to upload an excel sheet and convert that data into readable format with the help of React JS.
Created framework with concepts of spring boot using Spring JPA for database persistence.
Experienced in developing complex MySQL queries, Procedures, Stored Procedures, Packages and Views in MySQL database.
Ensured availability and security for database in a production environment.
Configured, tuned, and maintained MySQL Server database servers.
Implemented monitoring and established best practices around using react libraries.
Effectively communicated with the external vendors to resolve queries.

Environment: Java, JavaScript, Spring Boot, CSS, SQL, MySQL, React JS, Apache web server, Confidential Web Sphere.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship