Hadoop Developer Resume Charlotte, NC - Hire IT People

SUMMARY

8 years of overall experience with strong emphasis on Development, Design, Implementation and Testing of Software applications in Hadoop, HDFS, MapReduce, Hadoop, Spark, Pig, Hive, Sqoop, Kafka, Oozie and Zookeeper
Experience in writing Spark Applications using Python (Pyspark) and Scala
Data extraction, transformation and loading (ETL) using Pig, Hive, Sqoop and HBase
Acumen in Data migration from RDBMS to Hadoop platform using Sqoop and also designing and developing applications
Hands on experience AWS services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Redshift, Lambda, SNS, SQS).
Experience in writing custom UDFs for Hive to in corporate methods and functionality of Java into and HQL HIVEQL
Strong experience in core Java, J2EE, SQL
Experience in Migrating data from Hadoop/Hive/Hbase to DynamoDB using java automation.
Expertise in streaming data ingestion and processing
Acumen in choosing the right efficient Hadoop ecosystem and providing best solutions for Big data problems
Well versed in designing and developing Big data systems
Hands on experience in configuring Zookeeper to coordinate the servers in clusters and maintaining the data consistency
Experience in configuring and working with Flume to direct data from multiple sources directly into Hadoop
Expertise in migrating ETL transformation using Pig Latin scripts and Join operator
Hands on experience un handling relational databases such as DB2, My SQL and SQL Server
Knowledge in all phases of SLDC with Analysis, Designing, Development, Implementation, Debugging and software testing in client server environment
Basic experience in AWS Big Data stack - EMR, S3, GLUE etc
Experienced in implementing projects in Agile, Design Patterns and Waterfall
Well versed with Spring ceremonies methods carried out in Agile methodology
Imported data from AWS S3 into Spark RDD
Good knowledge on Machine Learning algorithms
Strong Analytical skills and problem-solving capabilities with good communication and interpersonal skills
Good team player with high motivation

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, Pig, HBase, Spark, YARN, Kafka, Hive, Flume, Sqoop, Oozie and Zookeeper

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, Amazon EMR

Languages: Python, Java, SQL, Scala, C/C++ and Linux shell scripting

ETL Tools: Talend Informatica

Methodology: Agile, Waterfall and Design Patterns

Web Design Tools: HTML, XML, JavaScript, CSS, JSON

Development / Build Tools: Eclipse, Maven, IntelliJ

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and IBM DB2

Operating systems: UNIX, LINUX, Mac os and Windows Variants

Data analytical tools: R, Pandas, NumPy, MATLAB, IBM SPSS

No SQL Databases: HBase, Cassandra

Cloud Technologies: Amazon Web Services - EC2, RDS, S3, EMR, Glue, Lambda

Machine Learning/Data Sciecne: Logistic Regression, Linear Regression, SVM, KNN, Decision TreesRandom Forests, K-Means, Dimansion reduction

Data Visualization Tools: Tableau, PowerBI, Excel, Rstudio (ggplot, ggplot2)

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Developing MapReduce job to parse raw data, populate tables and store the processed data in partitioned tables in the Enterprise Data Warehouse
Scripting Hive queries for ad hoc data analysis before progressing into ongoing database
Partitions and Bucketing in Hive to manage external tables optimizing performance
Generating real time feed using Kafka and Spark streaming and transforming it to parquet formatted dataframes in HDFS
Deployed application to GCP using Spinnekar(rpm based)
Launched multi-node kubernetes cluster in Google Kubernetes Engine (GKE) and migrated the dockerized application from AWS to GCP.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
Experience in moving data between GCP and Azure using Azure Data Factory.
Experience in building power bi reports on Azure Analysis services for better performance.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQueryDeveloped Microservices based on Restful web service using Akka Actors and Akka-Http framework in Scala which handles high concurrency and high volume of traffic
Developed REST based Scala service to pull data from ElasticSearch/Lucene dashboard, Splunk and Atlassian Jira
Installed, configured, monitored and maintained Hadoop cluster on Big Data platform.
Implemented solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Pig, Sqoop, Hbase, Map reduce, etc.
Worked on Big Data Integration &Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods.
Worked on big data such as Streamline analytics for building predictive models inside Machine Learning using Scala, Python and R.
Implemented Server-to-Server(S2S) Authentication based on token based system using Java to access Remote REST API
Developed MicroServices using Spring boot and core Java/J2EE hosted on AWS to be called by Confidential Fios Mobile App
Developed native Scala/Java library using Jsch to remotely execute Auto Logs Perl Scripts
Created and implemented a custom grid system using CSS grid system and jQuery JavaScript library
Developed complex automation JIRA workflows including project workflows, screen schemes, permission scheme, triggering Jira Event Listener API and notification schemes in JIRA using Atlassian Jira Plugin API based on core Java and Adaptavist Script runner based groovy scripts
Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.
Highly knowledge on Hadoop Administrator has extensive abilities of building, configuring and administration of large data clusters in big data environments using Apache distribution.
Experienced in processing Big data on the Apache Hadoop framework using MapReduce programs.
Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data, SQL, XML, HTML, Core Java, Shell Scripting etc.
Migrated existing MapReduce programs to Spark using Scala and Python
Extensively worked on Python and build the custom ingest framework.
Worked on Rest API using python.
Worked closely with the Architect; enhanced and optimized product Spark and python code to aggregate, group and run data mining tasks using Spark framework.
Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
Performed SQL queries on AWS with Athena and RedShif.
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
Migrated Hbase data from In house data center to AWS Dynamodb using Java API.
Responsible for API design and implementation to exposing data to/from Dynamodb.
Used Enterprise Java Beans EJB session beans in developing business layer APIs.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
Experience in developing Spark Application using Spark SQL and Python for faster and accurate data processing
Implementing events and records from Kafka by writing Spark applications in Scala
Used Spark as execution engine for performing data analytics using Hive environment
Solved performance issues using Pig scripts with the understanding of Joins, Groups and Aggregation and converting them to accommodate MapReduce jobs
Usage of HiveQL instead of using Derby Database to work in shared Hive environment when timeline is critical
Experience in using Sequence file, RCFile and AVRO during the Refresh stage
Developing Oozie workflow for initiating and scheduling ETL process
Creating Oozie workflow for regular incremental loads and importing them to Hive Tables that is obtained from the Teradata
Experience in performing Spin Up AWS instances EC2-classic and EC2-VPC using cloud formatting templates
Developed Schedulers to communicate with AWS to retrieve data
Managing and maintaining schedule for ETL pipelines on Glue
Developing Bash scripts to direct the log files from FTP server into Hive Tables
Imported metadata intoHive and moving the existing Hive tables and ongoing application to Amazon AWS cloud services for development
Developing MapReduce jobs to synthesize to the SLA protocols
Moving data from HDFS to Cassandra using the BulkOutputFormat class in the MapReduce job
Very good experience with both MapReduce 1 (Job tracker) and MapReduce 2 (YARN) setups

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

Analyzing, Troubleshooting and Development of Hadoop cluster using various big data analytical tools such as Spark, Pig, Hive, Scala, Tez and Kafka
Construction of scalable distributed data solutions using Hadoop
Performing data analytics on Hive using the help of SparkAPI over Hortonworks Hadoop YARN
Faster testing and processing data through Spark Code using scala and SparkSQL/Streaming
Implemented text analytics and processing with in-memory capabilities of Apache Spark written in Python
Importing and Exporting Teradata using Sqoop from HDFS to RDBMS and vice versa
Extraction, Transformation and Loading (ETL) of data from multiple sources like Databases, XML files and Flat Files
Imported data from AWS S3 into Spark RDD
Worked closely with the Architect; enhanced and optimized product Spark and python code to aggregate, group and run data mining tasks using Spark framework.
Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.
Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Used Scala collection framework to store and process the complex consumer information.
Used Scala functional programming concepts to develop business logic.
Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
Performed SQL queries on AWS with Athena and RedShif.
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
Directing data from AWS S3 bucket using Spark Streaming to perform real-time Transformation and Aggregations to build the data model and to send it to HDFS
Incremental import by creating Sqoop metastore jobs
Managing and Reviewing HBase log files
Writing MapReduce jobs to run on EMR clusters and managing the workflow for running other jobs
Extracting analytics report using EMR jobs to run on Amazon VPC cluster
Designed and implemented Hive queries to perform filtering, evaluation, load and storing data
Involved in performance tuning of the ETL process by addressing various performance issues at the extraction and transformation stages
Migrated Hbase data from In house data center to AWS Dynamodb using Java API.
Responsible for API design and implementation to exposing data to/from Dynamodb.
Used Enterprise Java Beans EJB session beans in developing business layer APIs.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
Exported analyzed data to local databases using Sqoop to create visualization dashboards generating reports for the managers
Data warehousing using Hive and managing hive tables
Working with Spark which provides fast general engine for processing big data integrated with Python programming
Created and managed technical documentation for launching Hadoop clusters and constructing Visualization dashboard templates for Quarter analysis.

Confidential

SQL Developer

Responsibilities:

Developed SQL queries to generate and store data reports.
Configured SQL jobs and maintenance plans to ensure database stability and integrity
Coded and tested SQL queries based on the table using inline, views, filters, merge statements, dynamic SQL statements and monitored indexes to bring down processing time
Ensuring several functions such as filters, randomizations and stratifications are active in the database
SQL performance tuning by modifying the indexes and setting transaction isolation levels and changing query structures by initiating Inline calculation to replace sub-query-involved functions
Generating data from multiple database servers such as ORACLE, DB2 and Access and connecting them using SSIS tool
Building ETA process to move data from one database servers to destination using the SISS package, VBA, Export/Import Wizard
Used SSIS control flow such as using Execute SQL task, Foreach Loop container, Script task, File System task to perform ETL functions
Used SSRS to generate formatted reports with stored procedures and expressions.
Designed SSRS reports providing visualization of the data
Developing series of automation using SQL, SRSS and Report Manager to generate production formatted reports

Confidential

Data Analyst Intern

Responsibilities:

Data extraction, compiling and tracking to generated reports post analysis
Standardized SQL, SAS and MicroStrategy based data management infrastructure to support Market advantage
Plan and coordinating the administration of PostgreSQL databases to ensure data accuracy, effective use of data within the Database containing the definition & structure of the build and operational guidelines
Performing SQL queries to maintain and manage the data on a monthly to weekly frequency basis depending on SLA terms
Predictive modeling using RStudio and performing time series analysis and time-to-event analysis to record changes in the market
Data visualization using R packages such as ggplot and ggplot2 to generate dashboards for team meetings
Developed optimized data and qualifying procedures
Analyzed data using Excel and MicroStrategy to generate business decisions suggestions
Continuously engage with senior application analysts to understand procedures & functional data reconciliation requirements to design and develop changes within the tool
Utilized Microsoft Excel to categorize reports into a detailed pivot table to develop improved insight deriving strategy
Redesigned data mart using extraction, transformation and loading (ETL) from various platforms for deeper insights production and effective decision making and documenting the whole process for future use
Took part in automating data import using the SSIS package from external environment which predominantly decreased time consumption from days to hours

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Charlotte, NC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship