Hadoop Developer Resume
Plano, TX
SUMMARY
- Having around 8 years of solid experience in Object Oriented Analysis, Big Data Processing in ingestion, storage, querying and analysis.
- Working in Hadoop and its eco - components such as HDFS, MapReduce, Sqoop, Flume, Hbase, Oozie, Avro, HCatalog, Hive, Pig and Spark SQL, Spark Streaming using Scala.
- Very good experience in designing and implementing MapReduce programing jobs in JAVA to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables and migrating and applying the map-reduce applications to the data from HDFS to Relational Database (SQL, ORACLE, DB2) or the other way according to client's requirement.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Hands-on experience in Cassandra, Flume and Spark (YARN).
- Great working experience on Real time streaming the data using Spark with Kafka connect.
- Responsible for writingHiveQueries for analyzing terabytes of customer data from Hbase and put the results in output file.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Experienced in Extraction, Transformation, and Loading (ETL) processing based on business need and extensively used Oozie workflow engine to run multiple Hive and Pig jobs.
- Extensive experience in writing Hive scripts for processing and analyzing large volumes and finding patterns and insights within structured and unstructured data.
- Implemented a technical solution on POC's, writing programming codes using technologies such as Python.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
- Experience in writing SQL queries, Stored Procedures, Triggers, Cursors and Packages.
- Experience in using CSV, TSV, Sequence files, RCFile, AVRO and HAR file formats.
- Great working knowledge of Data warehousing concepts and ETL processes.
- Experience in deploying and managing the Hadoop cluster using Cloudera Manager.
- Managing and scheduling Jobs workflow on a Hadoop Cluster using Oozie.
- Very good working experience on NoSQL database - Cassandra, MongoDB, HBase.
- Good experience on Relational Database technologies like Oracle, SQL server and My SQL.
- Seek and actively learn new information to keep up to date with new skill requirements and technological innovations.
- Worked in large and small teams for systems requirement, design & development.
- Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology.
- Experience in using various IDEs Eclipse, Eclipse, Intellij IDEA, My Eclipse and RAD.
- Self-motivated, responsible and proper time management with good Written, Verbal and Listening skill, commitment to co-operative teamwork.
- Excellent communication skills and strong analytical and problem solving abilities.
TECHNICAL SKILLS
Language/Technologies: Java 1.7, J2EE Suite, Servlets, JSP, JDBC, XML, Python, Scala
Hadoop Components: HDFS, MapReduce, Hive, Sqoop, Oozie, Hbase, Flume, Avro, Spark SQL, Spark Streaming, Zookeeper, Kafka
Framework: Hadoop, MapReduce, Struts, Springs Web flow.
Cloud Technologies: IBM BigInsights, Microsoft Azure, AWS
Databases/NoSQL: DB2, Oracle, My SQL, SQL Server, BigSQL, Hbase, MongoDB, Cassandra
Operating Systems: UNIX, Linux, Windows XP/2000/NT
IDE/Modelling Tools: Eclipse, IntelliJ IDEA, Visual Studio
Deployment Tools: OpenMake, JENKINS
Analysis/Design: J2EE Design Patterns, MVC Pattern.
PROFESSIONAL EXPERIENCE
Confidential, Plano, TX
Hadoop Developer
Responsibilities:
- Worked with engineering Leads to strategize and develop data flow solutions using Hadoop, Hive, Java and Perl in order to address long-term technical and business needs.
- Used Hadoop for batch processing and Apache Spark for real-time stream processing because of its ability to pick up where it was stopped abruptly without affecting the running job.
- Developed Map-Reduce jobs on YARN and Hadoop clusters to generate daily and monthly reports as per companies need.
- Used Scala whenever in need of fast deployment of jobs using Spark.
- Have good knowledge on Kafka and dealt closely with teams dealing with real time data feeds.
- Worked on tools and open source applications to ensure data management objectives are meant in terms of data quality, data integrity, and data monitoring.
- Worked on data insightful metrics feeding reporting and other applications.
- Good command in Hive partitioning, bucketing and performing both Map side joins and Reduce side joins on Hive tables and implementing Hive SerDes like REGEX, JSON and Avro.
- Debugging/Troubleshoot issues on UDF's in Hive.
- Managing and scheduling jobs on a Hadoop cluster using Oozie work flow.
- Various activates supporting team, like mentoring and training new engineers joining our team and conducting code reviews for data flow/data application implementations.
- Implemented a technical solution on POC's, writing programming codes using technologies such as Hadoop, YARN, and Python.
- Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Created data-models for data using the Cassandra Query Language.
- Hands-on experience on Hadoop tools like Mapreduce, Hive and Hbase
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDes like REGEX, JSON and Avro.
- Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes of data with respect to business requirement.
- Maintenance of data importing scripts using Hive and Mapreduce jobs
- Ability to understand and capture technical as well as business requirements.
- Data analysis and design in order to handle huge amount of data.
- Cross examining data loaded in Hive table with the source data in oracle.
- Working side by side with R&D, QA, and Operations teams to understand, design, develop and support ETL platform and end-to-end data flow requirements.
- Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
- Storing, processing and analyzing huge data-set for getting valuable insights from them.
Environment: Hadoop 2.6.0 - CDH 5, YARN, Pig, Hive, HBase, Mahout, Spark, Kafka, Redhat Linux, Java, Scala, Python, Eclipse, Perl, Cloudera Navigator, Cloudera Manager.
Confidential, Fort Mill, SC
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive, Impala and Map Reduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Real time streaming the data using Spark with Kafka.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala
- Worked within the Apache Hadoop framework, utilizing Opinion Lab statistics to ingest the data from a streaming application program interface (API), automate processes by creating Oozie workflows, and draw conclusions about consumer sentiment based on data patterns found through the use of Hive for external client use.
- Wrote the Storm topology with HDFS Bolt and Hive Bolts as destinations.
- Expertise in writing Storm topology development, maintenance and bug fixes.
- Developed Hadoop streaming Map/Reduce works using Java.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Experience working on processing unstructured data using Pig.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Good knowledge with NoSQL databases like HBase, Cassandra
- Advanced knowledge in performance troubleshooting and tuning Cassandra clusters.
- Highly involved in development/implementation of Cassandra environment.
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in the environment.
- Involved in scheduling Oozie workflow engine to run multiple pig jobs.
- Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Data scrubbing and processing with Oozie.
- Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
- Involved in developing Hive DDLs to create, alter and drop tables.
- Also exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Storm, Spark YARN.
- Used many features like Parallelize, Partitioned, Caching (both in-memory and disk Serialization), Kryo Serialization, etc.
Environment: Hadoop, Cloudera, Big Data, HDFS, Map Reduce, Sqoop, Spark, Hive, HBase, LINUX, Java, Eclipse, Hadoop Distribution of Cloudera, PL/SQL, Cassandra, Tableau, UNIX Shell Scripting, Putty and Eclipse, AWS (Amazon Web Services).
Confidential, Boston, MA
Java/Hadoop Developer
Responsibilities:
- Launching and Setup of HADOOP related tools on AWS, which includes configuring different components of HADOOP.
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive
- Experience in Using Sqoop to connect to the Sql Server or Oracle database and move the pivoted data to Hive tables and stored in Avro files.
- Managed the Hive database, which involves ingest and index of data.
- Expertise in exporting the data from Avro files and indexing the documents in sequence or serde file format.
- Hands on experience in writing custom UDF’s and also custom input and output formats.
- Scheduling the Hive jobs using Oozie process files.
- Developed Map Reduce jobs to store the data in to Hive Tables.
- Involved in design and architecture of custom Lucene storage handler.
- Maintained the test mini cluster in AWS.
- Involved in GUI development using JavaScript and AngularJS and Guice.
- Developed Unit test case using Jmockit framework and automated the scripts.
- Worked in Agile environment, which uses Jira to maintain the story points.
- Involved in implementing Kerberos secured environment for Hadoop cluster.
Environment: Hadoop, Big data, Hive, Hbase, Sqoop, Oozie, HDFS, Map Reduce, Jira, Bit bucket, Maven, J2EE, Guice, AngularJS, Jmockit, Lucene, Unix, Sql, AWS (Amazon Web Services).
Confidential
Java Developer
Responsibilities:
- Developed the system by following the agile methodology.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Applied OOAD principles for the analysis and design of the system.
- Created real time web applications using Node.js
- Used Web sphere Application Server to deploy the build.
- Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
- Used Spring Framework for developing business objects.
- Performed data validation in Struts Form beans and Action Classes.
- Used Eclipse for the Development, Testing and Debugging of the application.
- Used DOM Parser to parse the xml files.
- Log4j framework has been used for logging debug, info & error data.
- Used Oracle 10g Database for data persistence.
- SQL Developer was used as a database client.
- Used WinSCP to transfer file from local system to other system.
- Performed Test Driven Development (TDD) using JUnit.
- Used Ant script for build automation.
- Used Rational Clear Quest for defect logging and issue tracking.
Environment: Windows XP, Unix, Java, Design Patterns, Web sphere, Apache Ant, J2EE (Servlets, JSP), HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL Developer, WinSCP, Log4J and JUnit.
Confidential
Software Engineer
Responsibilities:
- Implemented different suite of Linux infrastructure.
- Evaluated new hardware, software and infrastructure solutions.
- Provided on call rotation for 24*7 and assistance for the team.
- Developed and maintained installation and configuration procedures.
- Performed backup and restores in Linux environment.
- Implemented system and maintenance tasks using shell scripts.
- Developed data flow, Entity Relationship and data structure diagrams.
- Queried the incoming data using MYSQL.
- Worked on adding and configuring devices like hard disks, etc.
- Developed Database applications using SQL and PL/SQL
Environment: Shell scripting, Linux and MySQL.