We provide IT Staff Augmentation Services!

Big-data Engineer/ Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Cincinatti, OH

SUMMARY:

  • 5+ years of IT experience in various industries with 4 years of hands on experience in developing Big - data and Hadoop applications.
  • Have strong technical foundation with in-depth knowledge in Big Data Hadoop, Data Reporting, Data Design, Data Analysis, Data governance, Data integration and Data quality.
  • Experience in setting, configuring and monitoring of Hadoop cluster of Cloudera, Hortonworks distribution.
  • Deep and extensive knowledge with HDFS, Spark, Apache Nifi, MapReduce, Pig, Hive, HBase, Sqoop, Storm, Yarn, Flume, Oozie, Zookeeper, Cassandra, MongoDBetc.
  • Thorough knowledge on Hadoop architecture and various components such as HDFS, Name Node, Data Node, Application Master, Resource Manager, Node Manager, Job Tracker, Task Tracker and MapReduce programming paradigm.
  • Good understanding on Hadoop MR1 and MR2 (YARN) Architecture.
  • Experience in analyzing data using HIVEQL, PIG Latin and Map Reduce programs in JAVA.
  • Expertise in writing Map Reduce Programs and UDFs for both HIVE and PIG in JAVA.Extended HIVE and PIG core functionality by using custom UDF's.
  • Experience in developing scalable solutions using NoSQL databases including HBASE, CASSANDRA, MongoDB and Couch DB.
  • Extracted files from NoSQL database like Couch DB, HBase through Flume and placed in HDFS for processing.
  • Efficient in working with Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the HiveQL queries.
  • Experienced in performing analytics on structured data using Hive queries, operations, Joins, tuning queries, SerDe's and UDF.
  • Good experience working with different Hadoop file formats like Sequence File, RCFile, ORC, AVRO and Parquet.
  • Experience in using modern Big-Data tools like SparkSQL to convert schema-less data into more structured files for further analysis.
  • Experience in Spark Streaming to receive real time data and store the stream data into HDFS.
  • Experienced in building Storm topologies, spouts, boults to stream data from sources, pre-process data.
  • Extensive experienced in working with different Spark modules like Spark transformations, Mlib, Graphx, Streaming and Spark QL.
  • Good experience in writing Map Reduce jobs using Java native code, Pig, Hive for various business use cases.
  • Experience in processing data serialization formats like Xml, JSON and Sequence Files.
  • Experience in working with Apache Sqoop to import and export data to and from HDFS and Hive.
  • Good working experience in designing Oozie workflows for cleaning data and storing into Hive tables for quick analysis.
  • Good knowledge streaming data using Flume and Kafka from multiple sources into HDFS.
  • Knowledge of processing and analyzing real-time data streams/flows using Kafka and HBase.
  • Experience with Informatica Power Center Big Data Edition (BDE) for high-speed Data Ingestion and Extraction.
  • Hands on experience with Amazon EMR, Cloudera (CDH4 & CDH5), and Horton Works Hadoop Distributions.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, Spark, Kafka, NIFI, MapReduce, Pig, Hive, Impala, HBase, Elastic search, Cassandra, Sqoop, Oozie, Zookeeper, Flume, Storm, YARN, MongoDB, Ranger, Mahout, Falcon, Avro, AWS.

Java & J2EE Technologies: Core Java, Hibernate, spring, JSP, Servlets, Java Beans, JDBC, EJB 3.0, JDBC, JMS, JMX, RMI.

IDE Tools: Eclipse, IntelliJ.

Programming languages: Java, Python, Scala, C, C++, MATLAB, SAS, PHP, SQL, PL/SQL.

Web Services & Technologies: XML, HTML, XHTML, JNDI, HTML5, AJAX, JQuery, JSON, CSS, JavaScript, AngularJS, VB Script, WSDL, SOAP and RESTful.

ETL tools: Pentaho, Talend, Informatica (MDM, IDQ, TPT), Teradata.

Databases: Oracle, SQL Server, MySQL, DB2, NoSQL.

Application Servers: Apache Tomcat, WebLogic, WebSphere, JBoss.

Tools: Maven, SBT, ANT, JUNIT, log4J.

Operating Systems: Windows, UNIX, Linux, Mac OS.

PROFESSIONAL EXPERIENCE:

Confidential, Cincinatti, OH

Big-Data Engineer/ Hadoop developer

Responsibilities:

  • Installed and configured Apache Hadoopclusters using YARN for application development and Apache toolkits like Apache Hive, Apache Pig, HBase, Apache Spark, Zookeeper, Flume, Kafka and Sqoop.
  • Developed and deployed successfully many modules on Spark, Hive, Sqoop, Shell, Pig, Scala and Python.
  • Successfully launched data transfer between Databases and HDFS with Sqoop, and used Flume in parallel to stream the log data from servers.
  • Modified Hive and SQL queries to Spark using Spark RDDs and Scala, python.
  • Designed and deployed multiple POCs using Scala and Yarn cluster, and checked the Performance of Spark, withCassandraand SQL.
  • Involved in data loading from UNIX file system to HDFS.
  • Generated Sqoop scripts for data ingestioninto Hadoopenvironment.
  • Implemented Spark API over YARN to achieve data analytics in Hive DB.
  • Created and scheduled multiple tasks for incremental load into staging tables.
  • Loaded the log data and data from UI apps into Hadoop lake using Apache Kafka service
  • Transformed data and performed data quality checks before loading onto HDFS with Pig.
  • Created Hive External tables in partitioned format to load the processed data obtained from MapReduce.
  • Operated analytical algorithms on HDFS data using MapReduce programs
  • Merged data from different sources using Hive joins and performed Adhoc queries.
  • Designed Hive Generic UDFs to perform record level business logic operations.
  • Implemented Data classification algorithms using MapReduce design patterns.
  • Extensively worked on creating combiners, Partitioning, distributed cache to improve the performance of MapReduce jobs.
  • Designed a workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig and used Zookeeper to coordinate the clusters.
  • Successfully Handled Different File Formats like Text, Avro, Parquet file formats, snappy, bz2, gzip compression.
  • Implemented test scripts to support test driven development and continuous integration.
  • Gained experience with NOSQL databases like Hbase, Cassandra.
  • Troubleshooting the cluster by reviewingHadoopLOGfiles.
  • Worked on multiple data formats on HDFS using Spark
  • Used Zookeeper for various types of centralized configurations.

Environment: Hadoop 2.3.0, Spark core, Cassandra, SparkSql, SparkR, PySpark, Hive, Pig, Sqoop, Zookeeper, Control-M, Java, and UNIX Shell Scripting.

Confidential, NewYork

Hadoop Developer

Responsibilities:

  • Workflow to export Cassandra column family data to CSV, loaded data to pig. Avro Data Serialization system to work with JSON data formats.
  • Created and maintained Technical documentation for launching Hadoop.
  • Clusters and for executing PigScripts.
  • Involved in managing deployments using xml scripts.
  • Developed Spark SQL scripts and involved in converting hive UDF's to Spark SQL UDF's.
  • Performed operations on data stored in HDFS and other NoSQL databases in both batch-oriented and ad-hoc contexts.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing. Involved in loading data from LINUX file system to HDFS system.
  • Running batch processes using Pig Scripts and developed Pig UDFs for data manipulation per Business Requirements.
  • Accessing Hive tables to perform analytics from java applications using JDBC.
  • Used Partitioning pattern in Map Reduce to move records into categories.
  • Commissioning and Decommissioning nodes to Hadoop Cluster.
  • Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
  • Testing - Unit testing through JUNIT & Integration testing in staging environment.
  • Followed Agile & Scrum principles in developing the project.
  • Integration, code review refining of code done by team members. Mentoring team members and solving critical technical issues faced by team members.
  • Environments: Hadoop, Map Reduce, HDFS, Pig, Hive, Java, Cloudera Distribution, Cassandra, Java, HTML, JavaScript, XML, XSLT, JQuery, AJAX, Web Services.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, Jira, Python, SQL, Cloudera Manager, Spark, AWS, Cassandra, Pig, Sqoop, Oozie, ZooKeeper, Storm, Flume, Azkaban, Solr, Talend Open Studio, Teradata, Scala, PL/SQL, MySQL, NoSQL, ElasticSearch, Windows, Horton works, HBase

Confidential

Hadoop Developer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Installed cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity and configured Hadoop, MapReduce, HDFS, developed multiple MapReduce jobs in JAVA for data cleaning.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Worked on installing planning, and slots configuration.
  • Implemented NameNode backup using NFS. This was done for High availability.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Worked on NoSQL databases including HBase, Monod, and Cassandra.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, NoSQL, Cassandra, Monod, Sqoop, HDFS, HBase,Oozie, PIG Latin, Hive, Flume, MapReduce, JAVA, Eclipse, NetBeans.

Confidential

Java Developer

Responsibilities:

  • Actively involved in Analysis, Detail Design, Development, System Testing and User Acceptance Testing.
  • Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces, and JSP tag libraries to define custom tags and JDBC for database connectivity.
  • Implemented struts framework (MVC): developed Action Servlet, Action Form bean, configured the struts-config descriptor, implemented validator framework.
  • Extensively involved in database designing work with Oracle Database and building the application in J2EE Architecture.
  • Integrated messaging with MQSERIES classes for JMS, which provides XML message Based interface. In this application publish-and-subscribe model of JMS is used.
  • Developed the EJB-Session Bean that acts as Facade, will be able to access the business entities through their local home interfaces.
  • Evaluated and worked with EJB's Container Managed Persistent strategy.
  • Used Web services - WSDL and SOAP for getting Loan information from third party and used SAX and DOM XML parsers for data retrieval
  • Experienced in writing the DTD for document exchange XML. Generating, parsing and displaying the XML in various formats using XSLT and CSS.
  • Used SVN version controlling system for the source code and project management.
  • Used XPath 1.0 for selecting nodes and XQuery to extract and manipulate data from XML documents.
  • Coding, testing and deploying the web application using RAD 7.0 and WebSphere Application Server 6.0.
  • Used JavaScript's for validating client side data.
  • Wrote unit tests for the implemented bean code using JUnit.
  • Extensively worked on UNIX Environment.
  • Data is exchanged in XML format, which helps in interoperability with other software applications.

Environment: Struts 2, JMS, EJB, JSP, RAD 7.0, WebSphere Application Server 6.0,XML parsers, XSLT XQueryXPath 1.0, HTML, CSS, JavaScript, IBM MQSeries, JBoss, ANT, JUnit, SVN, JDBC, Oracle, Unix, SVN.

We'd love your feedback!