Hadoop Developer Resume
Albnay, NY
PROFESSIONAL SUMMARY:
- Around 8+ years of professional experience including around 3 years of Java Developer and 5 plus years in Big Data analytics as Hadoop Developer.
- Experience in all the phases of Data warehouse life cycle involving Requirement Analysis, Design, Coding, Testing, and Deployment.
- Experience in architecting, designing, installation, configuration, and management of Apache Hadoop Clusters & Cloudera Hadoop Distribution.
- Working close together with QA and Operations teams to understand, design, and develop and end - to-end data flow requirements.
- Utilizing Oozie to schedule workflows.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Experience in managing the Hadoop infrastructure with Cloudera Manager.
- Practical knowledge on functionalities of every Hadoop daemons, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Experience in understanding and managing Hadoop Log Files.
- Experience in understanding hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
- Experience in Adding and removing the nodes in Hadoop Cluster.
- Experience in extracting the data from RDBMS into HDFS Sqoop.
- Experience in collecting the logs from log collector into HDFS using up Flume.
- Good understanding of No SQL databases such as HBase.
- Experience in analyzing data in HDFS through Map Reduce, Hive and Pig.
- Design, implement and review features and enhancements to Cassandra.
- Experience on UNIX commands and Shell Scripting.
- Experience in Data Analysis, Data Cleansing (Scrubbing), Data Validation and Verification, Data Conversion, Data Migrations and Data Mining.
- Excellent interpersonal, communication, documentation, and presentation skills.
SKILL:
Hadoop/Big Data: MapReduce, HDFS, Hive 2.3, Pig 0.17, HBASE 1.2, Zookeeper 3.4, Sqoop 1.4, Oozie, Flume 1.8, Scala 2.12, Kafka 1.0, Storm, MongoDB 3.6, Hadoop 3.0Spark, Cassandra 3.11, Impala 2.1
Database: Oracle 12c, MySQL, MS SQL server, Teradata15.
Web Tools: HTML 5.1, Java Script, XML, ODBC, JDBC, Hibernate, JSP, Servlets, Java, Struts, spring, and Avro.
Cloud Technology: Amazon Web Services (AWS), EC2, EC3, Elastic Search, Microsoft Azure.
Languages: Java/J2EE, SQL, Shell Scripting, C/C++, Python
Java/J2EE Technologies: JDBC, Java Script, JSP, Servlets, JQuery
IDE and Build Tools: Eclipse, NetBeans, MS Visual Studio, Ant, Maven, JIRA, Confluence Version Control Git, SVN, CVS
Operating System: Windows, Unix, Linux.
Tools: Eclipse Maven, ANT, JUnit, Jenkins, Soap UI, Log4j
Scripting Languages: JavaScript, JQuery, AJAX, CSS, XML, DOM, SOAP, REST
ELATED EXPERIE NCE:
Hadoop Developer
Confidential, Albnay, NY
Responsibilities:
- Understand Business requirement and involved in preparing Design document preparation according to client requirement.
- Analyzed Tera Data procedure to prepare all individual queries information.
- Developed hive queries according to business requirement.
- Developed UDF's in Hive where we don't have some default functions in hive.
- Developed UDF for converting data from Hive table to JSON format as per client requirement.
- Implemented Dynamic partitioning and Bucketing in Hive as part of performance tuning.
- Implemented the workflow and coordinator files using Oozie framework to automate tasks.
- Involved in Unit, Integration, System Testing.
- Prepared all unit test case documents and flow diagrams for all scripts which are used in the project.
- Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
- Experienced on loading and transforming of large sets of structured, semi structured, and unstructured data.
- Transforming unstructured data into structured data using PIG.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
- Good experience on Hadoop tools like MapReduce, Hive and HBase.
- Worked on both External and Managed HIVE tables for optimized performance.
- Developed HIVE scripts for analyst requirements for analysis.
- Maintenance of data importing scripts using Hive and Map reduce jobs.
- Data design and analysis to handle huge amount of data.
- Cross examining data loaded in Hive table with the source data in oracle.
- Working close together with QA and Operations teams to understand, design, and develop and end-to-end data flow requirements.
- Utilizing Oozie to schedule workflows
- Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
- Storing, processing, and analyzing huge data-set for getting valuable insights from them.
Environment: HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java, Eclipse, Cassandra, PL/SQL, UNIX Shell Scripting, and Eclipse.
Hadoop Engineer
Confidential, St.Louis, MO
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple MapReduce programs in Java for Data Analysis.
- Wrote MapReduce job using Pig Latin and Java API.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks, MapR and Cloudera clusters
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Load data from various data sources into HDFS using Kafka.
- Designed and presented plan for POC on impala.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Performed extensive Data Mining applications using HIVE.
- Responsible for performing extensive data validation using Hive.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Utilized Storm for processing large volume of datasets.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Used Visualization tools such as Powerview for excel, Tableau for visualizing and generating reports.
- Setup Hadoop cluster on Amazon EC2 using whirr for POC.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Maven, Teradata, Zookeeper, SVN, autosys, Hbase.
Big data Engineer
Confidential, Columbus, OH
Responsibilities:
- Used Sqoop and Java API’s to import the data to Cassandra from different relational databases.
- Created tables in Cassandra and loaded large data sets of structured, semi-structured and unstructured data from various data sources.
- Developed Map reduce jobs in Java for cleaning and preprocessing data.
- Wrote Python scripts for wrapper and utility automation.
- Performed cleansing operations by using storm builder topologies before moving data in to Cassandra.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Worked on configuring Hive, PIG, Impala, Sqoop, Flume and oozie in cloudera.
- Automated data movement between different Hadoop systems using Apache NiFi.
- Wrote Map reduce programs in python using Hadoop Streaming API.
- Wrote on creating Hive tables and loading them with data and writing Hive queries.
- Migration of ETL processes from SQL server to Hadoop using PIG for data manipulation.
- Developed spark jobs using Scala in test environment and Spark sql for querying.
- Worked on importing data from oracle tables to HDFS and Hbase tables using Sqoop.
- Wrote scripts to load data in to Spark RDDs and do in memory computations.
- Wrote Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Experience in Elastic search technologies in creating custom Solr Query components.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Worked on different data sources such as Oracle, Netezza, MySQL, Flat files etc.
- Extensively used Sqoop to get data from RDBMS sources like Teradata and Netezza.
- Worked with Flume to load the log data from different sources into HDFS.
- Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
- Developed Talend jobs to move inbound files to HDFS file location based on monthly, weekly, daily, and hourly partitioning.
Environment: Cloudera, Map Reduce, Spark SQL, Spark Streaming, Pig, Hive, Flume, Hue, Oozie, Java, Eclipse, Zookeeper, Cassandra, HBase, Talent, Github.
Hadoop Developer
Confidential
Responsibilities:
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Designed and implemented Incremental Imports into Hive tables.
- Deployed an Apache Solr search engine server to help speed up the search of the government cultural asset.
- Involved in collecting, aggregating, and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data, and writing hive queries that will run internally in map reduce way.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing pig Scripts.
Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, hbase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera., PL/SQL, Windows NT, UNIX Shell Scripting, Putty and Eclipse.
J2EE Developer
Confidential
Responsibilities:
- Responsible for the systems design, architecture, implementation, and integration with various technologies like Spring Integration, Web Services, Oracle Advanced Queues and WMQ's.
- Implemented framework Spring 3.05 and Spring Integration 2.0.5 upgrades.
- Used OSGi container framework to install bundles (modules) developed using Spring and Spring Integration.
- Worked on UI development using JSP on Struts and Spring MVC Frameworks.
- Developed DAOs (Data Access Object) and DOs (Data Object) using Hibernate as ORM to interact with DBMS - Oracle.
- Developed modules that integrate with web services that provide global information.
- Used Log4j for logging the application, log of the running system to trace the errors and certain automated routine functions.
- Worked as Web Dynpro Java developer and developed custom applications and creating the Portal screens.
JAVA Developer
Confidential
Responsibilities:
- Analysis, design, and development of Application based on J2EE using Struts and Hibernate.
- Involved in interacting with the Business Analyst and Architect during the Sprint Planning Sessions.
- Implemented Point to Point JMS queues and MDB's to fetch diagnostic details across various interfaces.
- Worked with WebSphere business integration technologies as WebSphere MQ and Message Broker 7.0 (Middleware tools) on Various Operating systems.
- Perform incident resolution for WebSphere Application Server, WebSphere MQ, IBM Message broker, Process and Portal server.
- Configured WebSphere resources including JDBC providers, JDBC data sources, connection pooling, and JavaMail sessions. Deployed Session and Entity EJBs in WebSphere.