Hadoop Developer Resume
Oklahoma City, OK
SUMMARY:
- More than 6+ years of work experience in application and product development using full SDLC primarily using Hadoop, Java/J2EE, Mainframe and ETL Technologies.
- Have good experience with Hadoop stack on CCDH, Core Java with 2 years of comprehensive experience in Hadoop Ecosystem, Map Reduce, HDFS and Spark,AWS.
- Passionate towards working in Big Data and Analyticsenvironment.
- Proven skills in establishing strategic direction yet technically strong in designing, implementing, and deploying. Collected/translated business requirements into distributed architecture & robust scalable designs.
- Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts
- Experience in using Pig, Hive, Scoop, HBase and Cloudera Manager.
- Extensive experience with ETL and Query big data tools like Pig Latin and Hive QL.
- Worked on kafka messaging system, able to ingest from kafka to Spark.
- Hands on experience in big data ingestion tools like Flume and Sqoop
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience with Cloudera CDH3, CDH4 and CDH5 distributions
- Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Experienced the integration of various data sources like Java 1.5, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Set up standards and processes for Hadoop based application design and implementation.
- Experience in managing and reviewing Hadoop Log files.
- Extensive experience with SQL, PL/SQL and database concepts
- Worked on NoSQL databases including HBase, Cassandra.
- Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper
- Experience in developing solutions to analyze large data sets efficiently
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services.
TECHNICAL SKILLS:
Hadoop / Spark/Aws: Hive, Sqoop, Pig, Puppet, Ambari, HBase, MongoDB, Cassandra, PowerPivot, Flume, Spark, AWS, Apache Storm
Java & J2EE Technologies: Core Java 1.5,Servlets 2.4
Operating Systems: Windows 95/98/2000/XP/Vista/7/8, Unix, Linux, Solaris
IDE Tools: Eclipse 3.2.2,Net Beans 6.1,RSA, RAD, Oracle Web logic workshop
Methodologies: Agile/ Scrum, Waterfall
Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL
Programming or Scripting Languages: C, Java, SQL, Unix Shell Scripting, Python,SCALA
Database: Oracle 11g 10g 9i, MySQL,Terradata, MS-SQL Server
PROFESSIONAL EXPERIENCE:
Confidential, Oklahoma City, OK
Hadoop Developer
Responsibilities:
- Participated in requirement gathering and converting the requirements into technical specifications
- Analyzed large data sets by running Hive queries and Pig scripts
- Developed Simple to complex MapReduce Jobs using Hive and Pig
- Experience in java for streaming Map Reduce programs.
- Analyze log files through hive and loading Json format to hive, and worked on external and internal tables and hive optimization techniques.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Set up instances on aws and connecting.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Stored data in the form of Avroformat, Parquet.
- Deployed instances and connected the instance from console.
- Worked on Sqark-sql and created data warehouse for both in Spark and hive.
- Writing Spark RDD to hive Spark SQL.
- Worked on Hadoop security using Kerebros.
- Responsible for managing data from multiple sources.
- Worked on webserver logs and created data pipelines.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to understand the input feed and expected output.
- Responsible to oversee and write scripts for data to be processed to get ready for the analysts.
- Got good experience with NOSQL database.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible for loading data from LINUX file systems to HDFS.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Environment: Hadoop, Java (jdk1.6), Hive, Pig, Sqoop, MapReduce, Flat files, Oracle 11g/10g, MySQL, Linux, Spark,AWS
Confidential, Boston, MA
Hadoop Developer
Responsibilities:- Worked on the proof-of-concept for Apache Hadoop framework initiation.
- Involved in various phases of Software Development Life Cycle.
- Work closely with various levels of individuals to coordinate and prioritize multiple projects.
- Worked in the BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
- Involved in source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load)
- Worked in tuning Hive and Pig scripts to improve performance.
- Worked extensively in creating MapReduce jobs to power data for search and aggregation
- Designed a data warehouse using Hive
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Handling structured and unstructured data and applying ETL processes.
- Database systems/mainframe and vice-versa. Loading data into HDFS.
- Extensively used Pig for data cleansing. Created partitioned tables in Hive
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Developed the Pig UDF’S to pre-process the data for analysis.
- Develop Hive queries for the analysts
- Involved in the database migrations to transfer data from one database to other and complete virtualization of many client applications
- Written build scripts using ant and participated in the deployment of one or more production system
- Production Rollout Support which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
Environment: Apache Hadoop, Java (jdk1.6), DataStax, Flat files, Oracle 11g/10g, mySQL, Toad 9.6, WindowsXP, UNIX, Sqoop, Hive, Oozie.
Confidential,Dallas, TX
Hadoop Developer
Responsibilities:- Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
- Responsible to manage data coming from different sources.
- Migrating databases, logic and reporting systems built in access and excel to long term automated systems using Spark
- Built data pipeline using Pig and Java/Scala Map Reduce to store onto HDFS.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and mysql.
- Developing business logic using scala.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and also written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developing design documents considering all possible approaches and identifying best of them.
- Loading Data into HBase using Bulk Load and Non - bulk load.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
- Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
- Fine-tuned Pig queries for better performance.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Confidential
Java Developer
Responsibilities:
- Responsible for design, development, Java application architecture, use cases, flowcharts, application flow, prototypes, proof concept of sample codes.
- Responsible for writing detailed design specification document and implementing all business rules.
- Designed and developed web pages using HTML and JSP’s using JSTL tags.
- Wrote data access component to perform DML operations using JDBC.
- Developed complex PL/SQL queries and utilized stored procedures and triggers to interact with Oracle database.
- Was involved in regression testing of the application using JUNIT.
- Designed and developed web pages using HTML and JSP’s using JSTL tags.
- Wrote client side form based validations using JavaScript.
- Developed scripts for EJB deployment, build releases and generating daily logs on both NT and UNIX. Involved in writing complicated queries and stored procedures using SQL, PL/SQL and Oracle.
Environment: Java (JDK1.3.x), EJB 1.1, JSP, Web Sphere 4.x/5.x, Eclipse 3.1, WSAD 4.0/5.x, Oracle 8i on Windows 2000 and UNIX Environment.
