Sr. Hadoop Developer Resume
Austin, TX
SUMMARY
- About 9 years of experience in Application analysis, Design, Development, Maintenance and Supporting web, Client - server based applications in Java/J2EE technologies which includes around 6+ years of experience with Big Data and Hadoop related components like HDFS, Map Reduce, Pig, Hive, YARN, Sqoop, Flume, Spark, Scala, and Kafka.
- Experience in multiple Hadoop distributions like Cloudera, and Horton works.
- Excellent understanding of NoSQL databases like HBase, Cassandra and MongoDB.
- Experience on working structured and unstructured data with various file formats such as Avro data files, xml files, JSON files, sequence files using Map Reduce programs.
- Work experience with cloud configurations like Amazon web services (AWS).
- Implemented custom business logic and performed join optimization, secondary sorting, custom sorting using Map Reduce programs.
- Experienced testing and running of Map Reduce pipelines.
- Expertise in Data ingestion using SQOOP, Apache Kafka, Spark Streaming and FLUME.
- Implemented business logic using Pig scripts. Wrote custom Pig UDF’s to analyze data.
- Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
- Hands on experience on fetching the live stream data from DB2 to Hbase table using Spark Streaming and Apache Kafka.
- Experience in designing, configuring and installing Datastax Cassandra.
- Good understanding of Conceptual, Logical and Physical Data Modeling.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop, Map Reduce and Pig jobs.
- Extensive experience with wiring SQL queries using HiveQL to perform analytics on data.
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Experienced in importing and exporting data between RDBMS and Teradata into HDFS using Sqoop.
- Experienced in handling streaming data like web server log data using flume.
- Good knowledge analyzing data using Python development and scripting for Hadoop Streaming.
- Worked with Spark Data Frames, Spark SQL and Spark Mlib extensively.
- Experience in implementing Spark using Scala and SparkSQL for faster processing of data.
- Extensive Hands on experience with Accessing and perform CURD operations against HBase data using Java API and implementing time series data management.
- Hands-on experience with message broker such as Apache Kafka.
- Employed in planning different stages of migrating data from RDBMS to Cassandra.
- Expertise in benchmarking and load testing a Cassandra cluster Cassandra-stress tool.
- Involved in various datamining tasks such as pattern mining, classification and clustering.
- Experienced in J2EE, Spring, Hibernate, SOAP/Rest web services,JMS, JNDI, EJB.
- Expertise with Application servers and web servers like Oracle Web Logic, IBM Web Sphere, Apache Tomcat, JBOSS and VMware.
- Experienced in developing the unit test cases using MRUnit and JUnit.
- Experience in using Maven and ANT for build automation.
- Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
- Expertise in database modeling, administration and development usingSQL and PL/SQL in Oracle (8i, 9i and 10g), MySQL, DB2 and SQL Server environments.
TECHNICAL SKILLS
BigData / Hadoop: HDFS / Map Reduce / Hive / Pig / HBase / YARN / Sqoop / Flume/ Oozie / Scala / Kafka / Apache Spark / Spark Sql / AWS / Talend.
Databases / NoSQL: Cassandra / MongoDB / HBase / Hive / SQL / Pl/SQL / Oracle.
Web Technologies: HTML / CSS / AJAX / JavaScript / JQuery.
Web Services: Soap / Rest / XML / XSD.
J2EE Frameworks: Hibernate / Springs / JMS / JSF.
Operating Systems: Windows / Unix / Linux.
Methodologies: Agile, Waterfall.
Ide’s / Tools: Eclipse / NetBeans/ Microsoft Visio.
Build Tools: Maven / Apache- ANT / Log4j.
PROFESSIONAL EXPERIENCE
Confidential, Austin, TX
Sr. Hadoop Developer
Responsibilities:
- Designed a pipeline to collect, clean, and prepare data for analysis using Map reduce, Spark, Pig, Hive and HBase and reporting using Tableau.
- Developed and implemented script to send large amount of data to any Http Server, which is configurable in number of users, operations and a range of dates.
- Created reports using Tableau using HiveQL.
- Created/modified UDF and UDAFs for Hive and PIG whenever necessary.
- Managed and reviewed Hadoop log files to identify issues when job fails.
- Used Apache Kafka for handling log messages that are handled by multiple systems.
- Worked with Data Staging Validation using Talend.
- Involved with Unit testing and integration testing with Hue
- Worked with Spark Data Frames, Spark SQL and Spark Mlib extensively.
- Worked with Data Science team in developing Spark Mlib applications to develop various predictive models
- Involved in installing, configuring andmanaging HadoopEcosystem components like Hive, Pig, Sqoop and Flume.
- Hands on experience in importing and exporting data from relational databases to HDFS and vice versa using Sqoop.
- Worked on Impala for creating tables and querying data.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Processed the source data to structured data and stored to Cassandra.
- Exploring with theSparkimproving the performance and optimization of the existing algorithms in Hadoop usingSparkContext,Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
- DevelopedSparkcode using scala andSpark-SQL/Streaming for faster testing and processing of data.
- Involved in creating Hive tables and loading with data.
Environment: Hortonwork’s HDP, Java, Kafka, Pig, Hive, HDFS, Cassandra, UNIX, Spark, Scala, HBase, HiveQL, AWS, Tableau.
Confidential, Long Beach, CA
Hadoop Developer
Responsibilities:
- Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Developed MapReduce Input format to read specific data format
- Developing and maintaining Workflow Scheduling Jobs in Oozie.
- Used Sqoop to transfer data from external sources to HDFS
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
- Imported and exported data into HDFS and Hive using Sqoop.
- Experience in loading and transforming huge sets of structured, semi structured and unstructured data.
- Responsible for loading unstructured and semi-structured data into Hadoop cluster coming from different sources using Flume.
- Worked on different file formats like XML files, Sequence files, CSV and Map files.
- Continuously monitored and managed Hadoop cluster using Cloudera Manager.
- Performed POC’s using latest technologies like spark, Kafka, scala.
- Created Hive tables, loaded them with data and wrote hive queries.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Experience in managing and reviewing Hadoop log files.
- Analysis of Web logs using Hadoop tools for operational and security related activities.
- Used all complex data types in Pig for handling data.
- Developed efficient Map Reduce programs in java for filtering out the unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Managed and reviewed Hadoop log files to identify issues when job fails.
Environment: Hadoop, MapReduce, HDFS, HBase, Hive, Pig Java, XML, SQL, MySql, Scala, Pig, Sqoop, Oozie
Confidential, Cupertino, CA
Hadoop Consultant
Responsibilities:
- Involved in creating Hive tables, loading with data and writing hive queries.
- Involved in data ingestion into HDFS using Sqoop and Flume from variety of sources.
- Developed MapReduce programs to parse the raw data, populate tables and store the refined data in partitioned tables.
- Installed and configured Hadoop and Hadoop stack on a 4 node cluster.
- Experienced in managing and reviewing application log files.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs.
- Create and maintain Hive warehouse for Hive analysis.
- Generate test cases for the new MR jobs.
- Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
- Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing
- Responsible for design and creation of Hive tables, partitioning, bucketing, loading data and writing hive queries.
- Created HBase tables to store various data formats of personally identifiable information data coming from different portfolios.
- Involved in managing and reviewing Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Environment: HDFS, Hive, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, MySQL, Tableau, Eclipse, Webservices, Oracle11g/SQL, JDBC and Websphere Applications.
Confidential, Fremont, CA
Software Engineer
Responsibilities:
- Worked on Marshalling and Un Marshalling theXMLusing the JIBX Parser.
- Interpreted and manipulatedspringandhibernateconfigure files.
- Worked onJMSandMessaging Queue (MQ)configurations.
- Designed and developed GUI Screens for user interfaces usingJSP, JavaScript, XSLT, AJAX, XML, HTML, CSS, JSON.
- Good in Configure, Design, implement and monitor Kafka Cluster and connectors.
- Generated the Class diagrams, Sequence diagrams extensity for the entire process flow using RAD.
- Consumed external web services by creating service contract through WSRR from different Development centers.
- Worked onSOAPbased Web services, tested Web Services usingSOAPUI.
- UsedJenkinstool to build the application on the server.
- Developed documentation for QA Environment.
- Loaded the records from Legacy database to Cassandra.
- Synchronized the create, Update and delete of records between Legacy Database and Cassandra.
- Created stored procedures, SQL Statements and triggers for the effective retrieval and storage of data into database.
- Application developed onAgilemethodologiesscrumand iterative method process.
- Used ApacheLog4jlogging API to log errors and messages.
- Deployed applications on Unix Environment for Dev, QA-Smoke
- Unit tested the application usingJUnitsand Easy Mock.
Environment: JDK, Spring Framework, XML, HTML, Cassandra, JSP, Hibernate, ANT, Java Script, XSLT, CSS, AJAX, JMS, SOAP Web Services, Web Sphere Application Server, PL/SQL, Junit, Log4j, Shell scripting, UNIX.
Confidential
Java/J2EE Developer
Responsibilities:
- Developed front-end screens usingJSP,HTMLandCSS.
- Developed server-side code usingStrutsandServlets.
- Developed core java classes for exceptions, utility classes, business delegate, and test cases.
- Developed SQL queries usingMySQLand established connectivity.
- Worked with Eclipse usingMavenplugin forEclipse IDE.
- Designed the user interface of the application usingHTML5, CSS3, JSF 2.0 JSP and JavaScript.
- Tested the application functionality withJUnit Test Cases.
- Developed all the User Interfaces usingJSPandStrutsframework.
- Writing Client-Side validations usingJavaScript.
- Extensively usedJQueryfor developing interactive web pages.
- Experience in developingwebservicesfor production systems usingSOAP andWSDL.
- Developed the user interface presentation screens usingHTML,XML, andCSS.
- Experience in working withspringusingAOP, IOC and JDBC template.
- Developed theShellscriptsto trigger the Java Batch job, Sending summary email for the batch job status and processing summary.
- The application was developed in Eclipse IDE and was deployed onTomcat server.
- Supported for bug fixes and functionality change.
Environment: Java, Struts 1.1, Servlets, JSP, HTML, CSS, JavaScript, Eclipse 3.2, Tomcat, Maven, MySQL, Windows and Linux, JUnit.