Data Engineer Resume

SUMMARY

Comprehensive experience of 8+ years in software engineering profession, wif over 4 years in Hadoop and Scala (spark) development and Administration experience along wif 2years of experience as Data Analyst.
Over 4+ years of experience in Hadoop architecture and various components such as HDFS Name node, Data node and MapReduce Job Tracker, Task Tracker, and programming paradigm.
Experience in building data pipelines using Azure Data Factory, Azure Databricks, and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse to control and grant database access.
Experience in using different Hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Sqoop, Hive, and Kafka.
Experience wif Data warehousing and Data mining, using one or more NoSQL Databases like HBase, Cassandra, and MongoDB.
Experience in using Sqoop to ingest data from RDBMS to HDFS.
Experienced in using various Python libraries like NumPy, Scipy, python - twitter, Pandas.
Worked on visualization tools like Tableau for report creation and further analysis.
Experienced wif Spark processing framework such as Spark SQL, and Data Warehousing and ETL processes.
Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine and imported data from AWS S3 into Spark RDD, performed transformations and actions on RDDs.
Experience wif spark streaming and to write spark jobs.
Experience developing high throughput streaming applications from Kafka queues and writing enriched data back to outbound Kafka queues.
Experience in ingesting data using Sqoop from HDFS to Relational Database Systems (RDBMS)- Oracle, DB2 and SQL Server and from RDBMS to HDFS.
Sound Experience wif AWS cloud (EMR, EC2, RDS, EBS, S3, Lambda, Glue, Elasticsearch, Kinesis, SQS, DynamoDB, Redshift, ECS).
Used RStudio for data pre-processing and building machine learning algorithms on datasets.
Good Knowledge on NLP, Statistical Models, Machine Learning, Data

TECHNICAL SKILLS

Skill Level: AMAZON DYNAMODB:

Unspecified: DYNAMODB:

Unspecified: APACHE HADOOP AMBARI:

Unspecified: AMBARI:

Unspecified: APACHE HADOOP HDFS:

Unspecified: HDFS:

Unspecified: APACHE HADOOP IMPALA:

Unspecified: IMPALA:

Unspecified: APACHE HADOOP MAPREDUCE:

Unspecified: MAPREDUCE:

Unspecified: APACHE HADOOP OOZIE:

Unspecified: OOZIE:

Unspecified: APACHE HADOOP SQOOP:

Unspecified: SQOOP:

Unspecified: CASSANDRA:

Unspecified: CDH:

Unspecified: CDH4:

Unspecified: DATA ANALYST:

Unspecified: DATA ANALYTICS:

Unspecified: DATA ARCHITECTURE:

Unspecified: DATA CLEANSING:

Unspecified: DATA COLLECTION:

Unspecified: DATA GOVERNANCE:

Unspecified: DATA INTEGRATION:

Unspecified: DATA MANAGEMENT:

Unspecified: DATA MIGRATION:

Unspecified: Data Mining:

Unspecified: DATA MODEL:

Unspecified: DATA PROFILING:

Unspecified: DATA SCIENCE:

Unspecified: DATA SOURCES:

Unspecified: DATA WAREHOUSES:

Unspecified: DATASETS:

Unspecified: DATASTAGE:

Unspecified: db2:

Unspecified: DISTRIBUTED CACHE:

Unspecified: ELASTICSEARCH:

Unspecified: ETL:

Unspecified: FLUME:

Unspecified: HADOOP:

Unspecified: HADOOP CLUSTER:

Unspecified: HBASE:

Unspecified: Informatica:

Unspecified: KAFKA:

Unspecified: MACHINE LEARNING:

Unspecified: MAP REDUCE:

Unspecified: METADATA:

Unspecified: MONGODB:

Unspecified: NATURAL LANGUAGE PROCESSING:

Unspecified: NLP:

Unspecified: NOSQL:

Unspecified: POWER BI:

Unspecified: REDIS:

Unspecified: SPLUNK:

Unspecified: STAR SCHEMA:

Unspecified: Teradata:

Unspecified: UNSTRUCTURED DATA:

Unspecified: APACHE SPARK:

Unspecified: API:

Unspecified: APPLICATION SERVER:

Unspecified: B2B SOFTWARE:

Unspecified: SOFTWARE ENGINEERING:

Unspecified: CODING:

Unspecified: CONTINUOUS INTEGRATION/DELIVERY:

Unspecified: CONTINUOUS INTEGRATION:

Unspecified: DESIGN PATTERNS:

Unspecified: Git:

Unspecified: Hive:

Unspecified: HTML:

Unspecified: JavaScript:

Unspecified: JENKINS:

Unspecified: OO:

Unspecified: Pig:

Unspecified: Python:

Unspecified: FLASK:

Unspecified: GGPLOT2:

Unspecified: MATPLOTLIB:

Unspecified: NUMPY:

Unspecified: PANDAS:

Unspecified: PYSPARK:

Unspecified: R LANGUAGE:

Unspecified: R PROGRAMMING:

Unspecified: REAL TIME:

Unspecified: SCRIPTING:

Unspecified: Subversion:

Unspecified: SVN:

Unspecified: XML:

Unspecified: ZooKeeper:

Unspecified: Data Analysis:

Unspecified: DATA MANIPULATION:

Unspecified: DATA TRANSFORMATION:

Unspecified: DATABASE MODELING:

Unspecified: Data Modeling:

Unspecified: DATA MODELS:

Unspecified: database:

Unspecified: DATABASE SYSTEMS:

Unspecified: DDL:

Unspecified: JDBC:

Unspecified: MS SQL Server:

Unspecified: SQL Server:

Unspecified: SQL Server 2000:

Unspecified: MySQL:

Unspecified: Oracle:

Unspecified: POSTGRES:

Unspecified: PostgreSQL:

Unspecified: RELATIONAL DATABASE:

Unspecified: SQL:

Unspecified: Stored Procedures:

Unspecified: Amazon Elastic Block Storage:

Unspecified: EBS:

Unspecified: Amazon Elastic Compute Cloud:

Unspecified: EC2:

Unspecified: AMAZON KINESIS:

Unspecified: KINESIS:

Unspecified: Amazon Simple Storage Service:

Unspecified: AWS S3:

Unspecified: AMAZON WEB SERVICES:

Unspecified: AWS:

Unspecified: ASTERADATA:

Unspecified: DOCKER:

Unspecified: KUBERNETES:

Unspecified: MICROSOFT SQL AZURE:

Unspecified: SQL AZURE:

Unspecified: TERRAFORM:

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

Involved in review of functional and non - functional requirements.
Developed and maintained halp desk metrics for an IT group supporting thousands of end users.
Extensively used Databricks notebooks for interactive analytics using Spark APIs.
Developed Informatica Mappings and Reusable Transformations for timely Loading of Data of a star schema.
Supported analytical platform, handled data quality, and improved the performance using Scala's higher order functions, lambda expressions, pattern matching and collections.
Implemented scalable microservices to handle concurrency and high traffic. Optimized existing Scala code and improved the cluster performance.
Reduced access time by refactoring data models, query optimization and implemented Redis cache to support Snowflake.
Design and develop Spark jobs for streaming the real-time data, which is received by Rabbit MQ, IBM MQ through Kafka and Spark streaming.
Experience wif Apache spark streaming and Batch framework. Create Spark jobs for data transformation and aggregation.
Performed data cleansing and applied transformations using Databricks and Spark data analysis.
Developed Hive queries to process the data and generate the data for visualizing.
Performed data analysis using HiveQL, Pig Latin and custom MapReduce programs in Java.
Developed Oozie workflows for scheduling and orchestrating the ETL process. Involved in writing Python scripts to automate the process of extracting weblogs using Airflow DAGs.
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
Written various Lambda services for automating the functionality on the Cloud.
Implemented Spark Java UDF's to handle data quality, filter, and data validation checks.
Used Airflow scheduling tool to deploy and run the docker containers in production Kubernetes clusters.
Spun up clusters and used Hadoop ecosystem tools like Kafka, Spark and databricks
for real-time analytics streaming, sqoop, pig, hive and CosmosDB for batch jobs.
Analyzed and optimized pertinent data stored in Snowflake using PySpark and SparkSQL
Worked wif DevOps team to Clusterize NIFI Pipeline on EC2 nodes integrated wif Spark, Kafka, Postgres running on other instances using SSL handshakes.
Used kinesis agent to ingest data, kinesis data streams to stream data in real time.
Work wif Continuous Integration (CI)/CD using Jenkins for timely builds and running Tests.
Worked on NiFi data Pipeline to process large set of data and configured Lookup's for Data Validation and Integrity.

Environment: Sqoop, MapReduce, Pig, Hive, Oozie, Zookeeper, Java, Shell scripting, SPARK, SPARK SQL, Flume, Data Bricks, AWS

Confidential

Data Engineer

Responsibilities:

Familiarity wif Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
Installed OS and administrated Hadoop stack wif CDH5 (wif YARN) Cloudera distribution including configuration management, monitoring, debugging, and performance tuning
Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos TEMPPrincipal for the Cloudera Manager Server, enabling Kerberos Using the Wizard.
Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated wif multiple teams to get it done.
Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
Worked wif NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
Model complex ETL jobs dat transform data visually wif data flow or by using compute services Azure Databricks, Azure Blob Storage, Azure SQL Database, Cosmos DB.
Extensively used the Azure Service like Azure Data Factory and Logic App for ETL, to push in/out the data from DB to Blob storage, HDInsight - HDFS, Hive Tables.
Worked on building an Enterprise Data Lake using Data Factory and Blob storage, enabling other teams to work wif more complex scenarios and ML solutions.
Developed Web Services in play framework using Scala in building stream data Platform.
Worked wif data modelers to understand financial data model and provided suggestions to the logical and physical data model.
Experienced in writing live Real-time Processing and core jobs using Spark Streaming wif Kafka as a Data pipe-line system.
Developing azure python scripts for Redshift CloudWatch metrics data collection and automating the datapoints to redshift database.
Developed scripts for loading application call logs to S3 and used AWS Glue ETL to load into Redshift for data analytics team
Installing IBM Http Server, WebSphere Plugins and WebSphere Application Server Network Deployment (ND).
Strong development skills wif Azure Data Lake, Azure Data Factory, SQL Data Warehouse Azure Blob, Azure Storage
Production Support and enhancement of existing databases and tools for analysis and decision making.
Expertise in SQL Server and T-SQL (DDL, DML and DCL) in constructing Tables, Joins, Indexed views, Indexes, Complex Stored procedures, Triggers, and useful functions to facilitate efficient data manipulation and consistent data storage according to the business rules.
Configured environment to load data from On-Premises through pipelines using U-SQL and Azure storage blob to Azure SQL, created scheduled PowerShell scripts for daily loads.

Confidential

Jr. Data Engineer

Responsibilities:

Conducted statistical analysis on data using python and various tools
. Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Informatica, and creating ETL jobs in Informatica.
Worked on Hadoop cluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production.
Experience in analyzing data using Python, R, SQL, Microsoft Excel,Hive, PySpark, Spark SQL for Data Mining, Data Cleansing, Data Munging and Machine Learning.
Designed changes to transform current Hadoop jobs to HBase.
Perform validation on machine learning output from R.
Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
Worked wif packages like ggplot2 and shiny in R to understand data and developing applications.
Implemented Bucketing and Partitioning using hive to assist the users wif data analysis.
Developed predictive models using Python and R to predict customers churn and classification of customers.
Develop database management systems for easy access, storage, and retrieval of data.
Perform DB activities such as indexing, performance tuning, and backup and restore.
Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
Expert in creating Hive UDFs using Java to analyze the data efficiently.
Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
Formulated procedures for integration of R programming plans wif data sources and delivery systems.
Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries

Environment: Cloudera CDH4.3, Hadoop, Pig, Hive, MapReduce, HDFS, Sqoop, Impala, Tableau, Flume, Oozie, Linux.

Confidential

Data Analyst

Responsibilities:

Gathered all the Sales Analysis report prototypes from the business analysts belonging to different Business units
Worked wif Master SSIS packages to execute a set of packages dat load data from various sources onto the Data Warehouse on a timely basis
Involved in Data Extraction, Transformation and Loading (ETL) from source systems
Responsible wif ETL design (identifying the source systems, designing source to target relationships, data cleansing, data quality, creating source specifications, ETL design documents,
The data received from Legacy Systems of customer information were cleansed and tan Transformed into staging tables and target tables in DB2.
Used External Tables to Transform and load data from Legacy systems into Target tables. Use of data transformation tools such as DTS, SSIS, Informatica or Data Stage.
Conducted Design reviews wif the business analysts, content developers and DBAs.
Designed, developed, and maintained Enterprise Data Architecture for enterprise data management including business intelligence systems, data governance, data quality, enterprise metadata tools, data modeling, data integration, operational data stores, data marts, data warehouses, and data standards.
Incremental loading of Fact table from the source system to Staging Table done on daily basis.
Coding SQL stored procedures and triggers.
Used various Transformations in SSIS Dataflow, Control Flow using for loop Containers and Fuzzy Lookups and Implemented Event Handlers and Error Handling in SSIS packages.
Involved in Cloudera Navigator access for auditing and viewing data.
Extracted tables from various databases for code review.
Generated document coding to create metadata names for database tables.
Analyzed metadata and table data for comparison and confirmation.
Adhered to document deadlines for assigned databases.
Ran routine reports on a scheduled basis as well as ad - hocs based on key point indicators.
Develop DataStage jobs to cleanse, transform and load data to Data Warehouse and sequencers to encapsulate the DataStage job flow.
Designed data visualizations to analyze and communicate findings.

Environment: Linux, Erwin, SQL Server 2000/2005, Crystal Reports 9.0, HTML, Data Stage Version 7.0, Oracle, Toad, MS Excel, Pow

Confidential

Java Developer

Responsibilities:

Utilizing Java, Java EE, JSP, Apache Server and SQL.
Use Spring Cloud Gateway to dispatch different URL to proper backend microservices.
Created roles to the customers based on their designation and teams.
Responsible for creating front end applications, user interactive (UI) web pages using web technologies like HTML5, CSS3 and JavaScript.
Designed and developed Microservices business components using SpringBoot.
Implemented SpringBoot services in combination wif Angular as front end to form a Microservice Oriented application
JUnit test cases for the classes post development.
Using SVN delivered and pushed code to Integration and QA environments on time for BA and QA signoffs.
Extreme attention to accuracy, detail, presentation and timeliness of delivery.
Used PCF for monitoring application stability and continuous integration.
Debugging production issues, root cause analysis and fixing.
Involved in setting up Maven configuration and halping Continuous Integration (CI) Issues.
Extensively used SVN as the version controlling Tool for check - ins and check-outs.
Involved in debugging the defects, code review and analysis of Performance issues.
Designed and developed the REST based Micro Services using the Springboot.
Design, develop & deliver the REST APIs dat are necessary to support new feature development and enhancements in an agile environment.

Environment: Java/J2ee, JSP, SpringBoot, Hibernate, SOAP, REST, Junit, Log4j, SOAPUI, HTML5, CSS, JavaScript, Unix shell Scripting.

Confidential

Information Technology Intern

Responsibilities:

Developing and releasing the Intellect - Brokerage product to Confidential .
The Intellect Invest -Brokerage product, a part of the Intellect suite of products, is a back office and retail brokerage solutions system dat enables banks to offer brokerage distribution services for their retail clients.
The system works as an independent product processor and needs to interface wif other products/systems, from the Intellect suite or other third parties, for accessing the cash accounts.
Created a set of processes TEMPhas also been designed specifically to cover the underlying processes dat the Intellect Brokerage product would service.
Created the CustCom and credit interfaces for the Account opening, Account Modification & Account Closure, Account Block/Unblock, Buy, Sell, Transfer Out, Transfer In, Transfer Out-In, Dividends & Interest, Corporate Actions, Amend Portfolio, Cancellation of Orders.
Created a communication API for interacting wif core banking product to facilitate the money flow.
Generated DDL scripts, and wrote DML scripts for Oracle database.
Applied design patterns and OO design concepts to improve the existing Java/JEE based code base.
Actively involved in writing SQL using SQL Query Builder. Involved in coordinating the on-shore/Off-shore development and mentoring the new team members.
Extensively Used Ant tool to build and configure J2EE applications and used Log4J for logging in the application
Involved in fixing defects and unit testing wif test cases using Junit.
Used ANT scripts to build the application and deployed on Web Sphere Application Server
Built scripts using ANT dat compiles the code, pre-compiles the JSPs, built an EAR file and deployed the application on the application server.

Environment: Java, J2EE, HTML, CSS, JavaScript, Ajax, Servlets, Struts, JSP, Multi-threading, XML, EJB, ANT, JDBC, Oracle, UML, Agile Methodology, Web Sphere Application Server and STS.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship