Sr. Hadoop Developer Resume Cleveland, OH - Hire IT People

SUMMARY:

Adept and experienced Hadoop developer with over 7 years of experience in programming world and 5 years of proficiency in Hadoop ecosystem and Bigdata systems
In - depth experience and solid subjective knowledge of HDFS, Map Reduce, Hive, Pig, Sqoop, Yarn/MRv2, Spark, Kafka, Impala, HBase and Oozie.
Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
Substantial experience writing MapReduce jobs in Java, PIG , Flume , Tez and Hive
Used Spark Data frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.
Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
Has strong fundamental understanding of distributed computing and distributed storage concepts for highly scalable data engineering.
Worked with Pig and Hive and developed custom UDF’s for building various datasets.
Worked on MapReduce framework using Java programming language extensively.
Strong experience troubleshooting and performance fine-tuning spark, MapReduce and hive applications.
Worked with Click Stream Data extensively for creating various behavioral patterns of the visitors and allowing data science team to run various predictive models.
Worked on No-SQL data-stores, primarily HBase using the Java API of HBase and Hive Integration.
Experienced in working with monitoring tools to check status of cluster using Cloudera manager and Ambari
Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
Significant experience in working with cloud environment like AMAZON WEB SERVICES (AWS) EC2 and S3.
Strong expertise in Unix shell script programming.
Expertise in creating Shell-Scripts and Regular Expression.
Dexterous in visualizing data using Tableau, PowerBI and MS Excel.
Knowledge on Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema and Teradata.
Highly proficient in Scala programming Knowledge
Experience with web technologies which include HTML, CSS, Java Script, Ajax, JSON and frameworks like J2EE, Angular JS, spring.
Good Knowledge in REST Webservices, SOAP programming, WSDL, XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills.
Good experience in Customer support role as, resolving production issues based on priority.

TECHNICAL SKILLS:

Hadoop/Bigdata Ecosystems: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, Elastic Search

Languages: C, C++, Java, Scala, Python, C#, SQL, PL/SQL

Frameworks: J2EE, Spring, Hibernate, Angular JS

Cluster Management and Monitoring: Coudera Manager, Hortonworks Ambari

Oracle 11g, MySQL, SQL: Server

Development Tools: Eclipse, NetBeans, Visual Studio, IntelliJ IDEA, XCode

Build Tools: ANT, Maven, sbt, Jenkins

Application Server: Tomcat 6.0, WebSphere7.0

Business Intelligence Tools: Tableau, Splunk, PowerBI

Version Control: GitHub, Bit Bucket, SVN

WORK EXPERIENCE:

Sr. Hadoop Developer

Confidential, Cleveland, OH

Responsibilities:

Gathered User requirements and designed technical and functional specifications.
Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, PIG, HBase, Zookeeper and Sqoop.
Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Worked on importing and exporting data into HDFS and Hive using Sqoop.
Used Flume to handle streaming data and loaded the data into Hadoop cluster.
Developed and executed hive queries for de-normalizing the data.
Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
Worked on Cluster of size 130 nodes.
Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
Managed, reviewed Hadoop log file, and worked in analysing SQL scripts and designed the solution for the process using Spark.
Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.

Environment : Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.

Hadoop Developer

Confidential, Eagan, MN

Responsibilities:

Worked on a live 24 node Hadoop cluster running on HDP 2.2.
Importing and exporting data jobs, to perform operations like copying data from RDBMS and to HDFS using Sqoop.
Worked with Sqoop jobs with incremental load to populate HAWQ External tables to internal table.
Created external and internal tables using HAWQ.
Worked with Spark core, Spark Streaming, and spark SQL modules of Spark.
Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
Experience in working with Flume to load the log data from multiple sources directly into HDFS.
Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters.
Assisted with performance tuning, monitoring, and troubleshooting.
Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
Experience in manipulating the streaming data to clusters through Kafka and Spark- Streaming.
Optimized Hive QL/pig scripts by using execution engine like TEZ, Spark.
Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
Experienced in reviewing Hadoop log files to delete failures.
Performed benchmarking of the NoSQL databases, Cassandra and HBASE streams.
Worked with Pig, HBASE, NoSQL database HBASE and Sqoop. For analysing the Hadoop cluster as well as big data.
Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tabled in Hive to optimize performance.
Creating Hive tables and working on them for data analysis to meet the business requirements.
Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
Experience in using Sequence files, RC file, AVRO and HAR file formats.
Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
Used FLUME to dump the application server logs into HDFS.
Automating backups by shell for Linux to transfer data in S3 bucket.
Experience in UNIX Shell scripting.
Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
Automated incremental loads to load data into production cluster.

Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBASE, Sqoop, Pig, Flume, Oracle, Teradata, PL/SQL, Java, Shell Scripting, HP ALM.

Hadoop Developer

Confidential, St Louis, MO

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing.
Importing and exporting data into HDFS and Hive using Sqoop.
Used Multithreading, synchronization, caching and memory management.
Used JAVA application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
Built BIG data clusters using Apache Spark architecture for Analytics.
Developed PIG Latin scripts for the analysis of semi structured data. Developed and involved in the industry specific UDF (user defined functions)
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Used Sqoop to import data into HDFS and Hive from other data systems.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Implemented partitioning, dynamic partitions and buckets in HIVE.
Load and transform large sets of structured, semi structured and unstructured data.
Supported Map Reduce Programs those are running on the cluster.
Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs.
Utilized Java and MySQL from day to day to debug and fix issues with client processes.
Managed and reviewed log files.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Spark, MongoDB, Flume, Spark, HTML, XML, SQL, MySQL, Core Java, Eclipse, Shell scripting, UNIX.

Big Data Engineer/Developer

Confidential

Responsibilities:

Developed several advanced Map Reduce programs to process data files received
Developed Map Reduce Programs for data analysis and data cleaning.
Firm knowledge on various summarization patterns to calculate aggregate statistical values over dataset.
Experience in implementing joins in the analysis of dataset to discover interesting relationships.
Completely involved in the requirement analysis phase.
Extending Hive and Pig core functionality by writing custom UDFs.
Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Developed Pig Scripts and Pig UDFs to load data files into Hadoop.
Analyzed the data by performing Hive queries and running Pig scripts.
Developed PIG Latin scripts for the analysis of semi structured data and unstructured data.
Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering
Experience in writing cron jobs to run at regular intervals.
Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
Involved in loading data from edge node to HDFS using shell scripting.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Experience in managing and reviewing Hadoop log files.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hadoop 1.1.1, Java, Apache Pig 0.10.0, Apache Hive 0.10.0, MapReduce, HDFS, Flume 1.4.0, GIT, UNIX Shell scripting, PostgreSQL, Linux.

Java Developer

Confidential

Responsibilities:

Involved in Analysis, Design, Implementation and Bug Fixing Activities.
Designing the initial Web-WAP pages for a better UI as per the requirement.
Involved in Functional & Technical Specification documents review and the code review.
Undergone on the Domain Knowledge.
Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
Prepared the Support Guide containing the complete functionality.

Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Cleveland, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship