We provide IT Staff Augmentation Services!

Hadoop Developer Resume

SUMMARY

  • Experienced on major Hadoop ecosystem’s projects such as MAPREDUCE HIVE, PIG, HBASE, SQOOP, SPARK, SCALA, and Oozie with Cloudera Manager.
  • Hands on experience with Cloudera and multi cluster nodes on Cloudera Sandbox.
  • Expertise at designing tables in Hive, PIG, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS.
  • Have good knowledge on AWS and S3 related to Hadoop Ecosystems for Data Interactions.
  • Experienced in working with data architecture including pipeline design of data ingestion, Architecture information ofHadoop, data modeling, machine learning and advanced data processing.
  • Experience optimizing ETL workflows, where data coming from different sources and it is processed.
  • Hands on experience with Spark Core, Spark Sql, Spark Streaming, MapReduce, Pig, Programming Model, Installation and Configuration ofHadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
  • Handle the TEXT, JSON, XML, AVRO, Sequence file, Parquet Log data using Hive (SERDE), Pig and filter the data based on query factor.
  • ETL: Data extraction, managing, aggressions and loading into NoSQL Data base HBASE.
  • Good understanding of SDLC and STLC.
  • Expertise in development of Web services which includes XML, CSS, HTML,SOAP/ REST requests and response, WSDL, UDDI, RESTAPI, JAX - RPC, JAX-WS and Web service authentication.
  • Expertise in developing Pig Latin Script and Hive Query Language.
  • Proficiency in Linux (UNIX) and Windows OS.
  • Extensive knowledge about Zookeeper process for various types of centralized configurations.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that runHadoopMap-Reduce and Pig jobs.
  • Experienced the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
  • Experience in managing and reviewingHadoopLog files using FLUME and also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
  • Hands on experience with SPARK to handle the streaming data.
  • Shell Scripting to load the data and process it from various Enterprise Resource Planning (ERP) sources.
  • Experienced in creative and effective front-end development usingJSP, JavaScript,HTML 5, DHTML, XHTMLAjax and CSS.
  • Hands on experience in writing Pig Latin and Pig Interpreter to run the Map Reduce jobs.
  • Hands on experience with SCALA for the batch processing and Spark streaming data.
  • Good Understanding ofHadooparchitecture and the daemons ofHadoopincluding Name-Node, Data Node, Job Tracker, Task Tracker, Resource Manager.
  • Hands on experience in ingesting data into Data Warehouse using various data loading techniques.
  • Developed UML Diagrams for Object Oriented Design: Use Cases, Sequence Diagrams and Class Diagrams using Rational Rose, Visual and Visio.
  • Good Working Knowledge on IBM Web Sphere Application Server and IBM RSA Tool.
  • Worked on IDE’s such as Eclipse/My Eclipse, RAD, NetBeans and IntelliJ developing, deploying and debugging the applications.
  • Expertise in working with relational databases such as Oracle 11g/10g/9i/, SQL Server 2012.
  • Good knowledge of stored procedures, functions, etc. using SQL and PL/SQL.
  • Expertise in J2EE technologies like Servlets, JSP, Struts, Hibernate and JDBC.
  • Experience on Test Automation tools like Soap UI, Soap UI Pro
  • Experience in Agile methodology and worked as a scrum master.
  • Extensive experience with RAD 6.0, RSA, Eclipse3.1.2.
  • Experienced with web/application servers as IBM Web Sphere 5.1/6.0/7.0 and JBoss
  • Expertise in using Version Control systems like CVS (Concurrent Versioning System), STAR TEAM, SVN, Jenkins, SONAR and RTC.
  • Experience in TDD and BDD.
  • Excellent Problem Solving skills, Documentation Skills and Communication Skills.

TECHNICAL SKILLS

Big Data: HDFS, YARN, MapReduce, Pig, Hive, HBase, Spark, Scala,, Sqoop, AWS, Oozie, kafka.

Programming Languages: Core java. UNIX Shell Scripting, SQL, Knowledge on python.

J2EE Technologies: JSP, Servlets, JMS, EJB, JDBC, JAAS, JNDI

Frameworks: Jakarta Struts, Apache Wicket, AJAX, JUnit, Nunit, TestNG.

Web Services: WSDL, SOAP, Apache, REST.

Client Technologies: Java Script, AJAX, CSS, HTML 5, XHTML

Operating Systems: UNIX, Windows, LINUX

Application Servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere

Web technologies: JSP, Servlets, Socket Programming, JNDI, JDBC, Java Beans, JavaScript, Web Services (JAX-WS)

Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x

Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0

Tools: RTC, RSA, Control-M, Oozie, Hue, SQL Developer, SOAP UI, ANT, Maven.

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer

Environment: Cloudera cent OS, Cloudera Manager, HDFS, Spark 1.6, Spark SQL, and Scala2.10, Sqoop, Hive, JiRA, Oozie, Bit bucket, Hue, Db2, Jenkins.

Responsibilities:

  • Involved in converting all Abinitio functions in spark and Scala. Like String Enrichments, date Enrichments, Numeric Enrichments.
  • Involved in Gathering the matching requirements for Invoice and Transaction data
  • Involved in design Rules and Filters on top invoice and transaction data
  • Involved in Creating Mapping Parameters for Matching Data
  • Working on HDFS & Spark and Scala Systems on Cloud networks.
  • Involved in creating the manual hive queries matching rules.
  • Writing Shell scripts to automate the process flow.
  • Involved in developing rules and Filters using spark sql and RDD.
  • Involved in UNIT testing.
  • Involved in Testing all the rules and Filters.
  • Involved in code review for Data Selection code.

Confidential

Data Analyst

Environment: Cloudera cent Os, Spark 1.6, Spark SQL, Scala2.10, Sqoop, oracle, DB2, Neteeza.

Responsibilities:

  • Involved in Requirement Gathering to connect with BA.
  • Working Closely with BA & vendor for creating technical Documents like High level Design and low Level Design specifications.
  • Design Mappings for Audit Control for POC Basis.
  • Involved in Creating Mapping Parameters to define the code Delta Calculations.
  • Involved in UNIT testing.
  • Working on HDFS & Spark and Scala Systems on Cloud networks.
  • Writing Hive queries to read from HBase.
  • Writing Shell scripts to automate the process flow.
  • Writing Hive, Pig Queries and UDF’s on different datasets and joining them.
  • Used Sqoop for data transfer between MS-Sql and HDFS
  • Impala for Ad-hoc Query testing.
  • Storing the extracted data into HDFS using Flume.
  • Serializing JSON data and storing the data into tables using Hive.
  • Schema definition for JSON File for multi nested Using HIVE-SERDE.
  • Hive Data sampling, Buckets and Cluster methods for schema.
  • Writing the Hadoop Job workflows & scheduling using Oozie
  • Involved in Green hopper and JIRA - Agile methodology for task distribution with estimates.

Confidential

Big Data Developer

Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, MapReduce, Eclipse Indigo, Hive, PIG, Sqoop, Flume and SQL.

Responsibilities:

  • Sets the input Path of FTP Server from where the files are to be read to "mapred.input.dir" property of Hadoop configuration.
  • Uses Apache commons net FTP Client API to create connection with FTP Server and to send different commands to FTP Server.
  • Checks if given input path exists or not on server. If Path does not exist, an exception is thrown with appropriate error message.
  • Uses Hadoop Compression codec API to determine if the file is splittable or not.
  • If the file is splittable, prepare splits for every single file with the split size defined.
  • Submits the map tasks (The number of map tasks created equals to the no. of splits prepared.).
  • Finally closes connection with FTP Server.
  • HBASE: is used for each HBase table represents a sensitive data type can handle billion of rows.
  • Masking (HBase Put) and Stores all hash/encrypted cipher text as key value pair.
  • Un-Masking (HBase Get) the data by Querying HBase table for encrypted cipher text and retrieve encrypted text in sub-second time.
  • The Job should be configured in HBase trans conf table. For all above scenarios, the Jobs are already configured in trans conf table.
  • PIG: Using PIG UDF’s Encryption and Decryption of tokenize sensitive data using FTP and tokenization scripts.
  • Hive: Using the Hive table Encrypted data will be loaded by the encryption job and this is the hive table from where the data will be queried for decryption. (Define the schema for hive table based on the input dataset
  • Developed Map Reduce Programs those are running on the cluster.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows.
  • Responsible for operational support of Production system
  • Loading log data directly into HDFS using Flume.
  • Experienced in managing and reviewing Hadoop log files.
  • Analyzing data with Hive, Pig and Hadoop Streaming
  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Developed Hive queries to process the data for visualizing.
  • Involve in meeting and release, working closely with my teammates and managers.
  • Developed onHadooptechnologies including HDFS, MapReduce2, YARN, Hive, HBase, Sqoop.
  • Translated, loaded and streamed disparate data sets in multiple formats/sources including Avro, JSON/Kafka queue, Flume etc.
  • Translated functional and technical requirements into detail programs running onHadoop Map Reduce.
  • Migrated traditional database code to distributed system code (mainly HiveQL).
  • Implemented ETL to load data intoHadoopwith Sqoop.
  • Used HBase for scalable storage and fast query.

Hire Now