Hadoop Developer Resume
Columbus, GA
SUMMARY:
- 5+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
- 3 years of comprehensive experience in Big Data processing using Apache Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, HBase, Spark, NoSQL, Oozie, Kafka, Zoo Keeper and Flume).
- In depth understanding and knowledge of Hadoop Architecture and its components such as HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager.
- Knowledge on testing with Big Data Technologies like Hadoop, MapReduce, Hive, Pig, HBase, Kafka and Spark.
- Hands on experience in installing, configuring and testing ecosystem components like Hadoop MapReduce, HDFS, HBase, Zoo Keeper, Oozie, Hive, HDP, Cassandra, Sqoop, PIG, Flume.
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Good Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Experience in preparing test plans and executing the test cases.
- Good experience and great knowledge in testing the process for Hadoop based application design and implementation.
- Good knowledge of java to do the Map Reduce Testing.
- Experience in developing PIG Latin Scripts and Hive Queries.
- Experience in scripting for automation and monitoring using Python.
- Good knowledge in programming Spark using Scala and Experienced in handling Spark SQL, Streaming and complex analytics using Spark over Cloudera Hadoop YARN.
- Implemented Spark using Scala and Spark SQL for faster processing and testing of data.
- Sound knowledge on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
- Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Worked extensively with Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Expertise in writing Map - Reduce Jobs in Java for processing large sets of structured semi-structured and unstructured data sets and stores them in HDFS.
- Experience working on NoSQL databases such as HBase.
- Strong Knowledge in understanding Open source with Network Controllers.
- Ability to work effectively with associates at all levels within the organization.
- Strong background in mathematics and have very good analytical and problem-solving skills.
TECHNICAL SKILLS:
Hadoop Technologies: HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Flume, Oozie, Cassandra, YARN, Apache Spark, Impala, Kafka, MapReduce.
Hadoop Distribution: Cloudera CDHs, Hortonworks HDPs.
Programming Languages: Core Java, Python, SQL, C, HTML.
Database Systems: Oracle, MySQL, HBase, Cassandra
IDE Tools: Eclipse, NetBeans, IntelliJ
Monitoring Tools: Ambari, Cloudera Manager
Operating Systems: Windows, Linux, UNIX
PROFESSIONAL EXPERIENCE:
Confidential, Columbus, GA
Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive; Map Reduce loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Experienced in working with different Hadoop ecosystem components such as HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Storm, Oozie, Impala and Flume.
- Importing and exporting data into HDFS from Relational databases and vice versa using Sqoop.
- In depth understanding and knowledge of Hadoop Architecture and its components such as HDFS, Map Reduce, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager.
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented MapReduce jobs to write data into Avro format.
- Created Hive tables to store the processed results in a tabular format.
- Implemented Spark using Scala and Spark SQL for faster processing and testing of data.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating hive queries.
- Performed Sqoop operations for various file transfers through the HBase tables for processing of data to several MangoDB.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Evaluated Oozie for workflow orchestration in the automation of MapReduce jobs, Pig and Hive jobs.
- Created tables, secondary indexes, join indexes in Teradata development Environment for testing.
- Extracted files from other databases through Sqoop and placed in HDFS and processed.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Experienced in writing Pig scripts and Pig UDFs to pre-process the data for analysis.
Environment: HDFS, Hive, Pig, MapReduce, CDH, Spark, AVRO, Sqoop, Oozie, Flume, Teradata, Kafka, Scala, HBase, SQL, Talend, Java, Unix.
Confidential, charlotte, NC
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop .
- Handling structured, semi structured and unstructured data.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems and vice-versa.
- Developed Simple MapReduce Jobs using Hive and Pig.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Pig for data cleansing and created partitioned tables in Hive.
- Managed and reviewed Hadoop log files.
- Worked in Hadoop MapReduce, HDFS Developed multiple MapReduce jobs in java for data cleaning and processing.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Integrated Hive and HBase to perform queries using Impala .
- Responsible to manage data coming from different sources.
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Mentored analyst and test team for writing Hive Queries.
Environment: Hadoop, MapReduce, HDFS, Hive, HBase, Sqoop, Impala, Java (jdk1.6), Pig, Flume, Oracle 11/10g, MySQL, Eclipse, Java, Shell Scripting, SQL Developer, Putty, XML/HTML.
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Worked on installing cluster, commissioning & decommissioning of data nodes, name node recovery, capacity planning, and slots configuration.
- Created HBase tables to store variable data formats of data coming from different portfolios.
- Implemented a script to transmit sysprin information from Oracle to HBase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from various sources.
- Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, HDFS, Pig, Zookeeper, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat.
Confidential
Jr. Java Developer
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Involved in designing UML Use case diagrams, Class diagrams, and Sequence diagrams using Rational Rose.
- Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Created Action Classes which route submittals to appropriate EJB components and render retrieved information.
- Used Core java and object-oriented concepts.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
- Deployed application on windows using IBM Web Sphere Application Server.
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Used Web Services - WSDL and REST for getting credit card information from third party.
- Used ANT scripts to build the application and deployed on Web Sphere Application Server.
Environment: Core Java, J2EE, Oracle, SQL Server, JSP, JDK, JavaScript, HTML, CSS, Web Services, Windows.