We provide IT Staff Augmentation Services!

Hadoop (big Data) Developer/admin Resume

5.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY

  • 8+ years of professional experience in Software Development and Requirement Analysis in Agile work environment with 4+ years of Big Data Ecosystems experience in ingestion, storage, querying, processing and analysis of Big Data.
  • Experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Oozie, Mahout, Python, Spark, Storm, Cassandra, MongoDB, Big Data and Big Data Analytics.
  • Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondry Namenode, and MapReduce concepts.
  • Experienced managing No - SQL DB on large Hadoop distribution Systems such as: Cloudera, Hortonworks HDP, MapR M series etc.
  • Experienced developing Hadoop integration for data ingestion, data mapping and data process capabilities.
  • Experienced in building analytics for structured and unstructured data and managing large data ingestion using technologies like Kafka/Avro/Thift.
  • Worked with various data sources such as Flat files and RDBMS-Teradata, SQL server 2005, Netezza and Oracle. Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion.
  • Exceptional ability to quickly master new concepts and capable of working in groups as well as independently.
  • Has good knowledge of virtualization and worked on VMware Virtual Center.
  • Excellent working knowledge of different statistical analysis tools like SPSS and Microsoft Excel.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
  • Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
  • Experience in ETL (Datastage ) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
  • Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the Confidential data warehouse.
  • Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
  • Proficiency in programming with different IDE's like Eclipse, NetBeans.
  • Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
  • Good understanding of service oriented architecture (SOA) and web services like XML, XSD, XSDL, SOAP.
  • Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS cloud services: EC2, Cloud Formation, VPC, S3, etc.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • In-depth understanding of Data Structure and Algorithms.
  • Experience in managing and troubleshooting Hadoop related issues.
  • Expertise in setting up standards and processes for Hadoop based application design and implementation.
  • Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Experience in managing Hadoop clusters using Cloudera Manager.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Extensive experience working in Oracle, Netezza, DB2, SQL Server and My SQL database.
  • Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.

TECHNICAL SKILLS

Hadoop ECO Systems: HDFS, Map Reducing, HDFS, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and HBase, Cassandra

NO SQL: HBase, Cassandra, MongoDB

Data Bases: MS SQL Server 2000/2005/2008/2012 , MY SQL, Oracle 9i/10g

Languages: Languages Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, PL/SQL.

Operating Systems: Windows Server 2000/2003/2008 , Windows XP/Vista, Mac OS, UNIX, LINUX

Java Technologies: Servlets, JavaBeans, JDBC, JNDI

Frame Works: JUnit and JTest

IDE’s & Utilities: Eclipse, Maven, NetBeans.

SQL Server Tools: SQL Server Management Studio, Enterprise Manager, QueryAnalyser, Profiler, Export & Import (DTS).

WebDev. Technologies: ASP.NET, HTML,XML

PROFESSIONAL EXPERIENCE

Confidential - Minneapolis, MN

Hadoop (Big Data) Developer/Admin

Responsibilities:

  • Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper.
  • Used Sqoop to transfer data between RDBMS and HDFS.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Implemented complex map reduce programs to perform map side joins using distributed cache.
  • Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in Mapreduce.
  • Thoroughly tested Mapreduce programs using MRUnit and Junit testing frameworks.
  • Responsible for troubleshooting issues in the execution of Mapreduce jobs by inspecting and reviewing log files.
  • Converted existing SQL queries into Hive QL queries.
  • Implemented UDFs, UDAFs, UDTFs in java for hive to process the data that can’t be performed using Hive inbuilt functions.
  • Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
  • Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
  • Loaded and analyzed Omniture logs generated by different web applications.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
  • Refined the Website clickstream data from Omniture logs and moved it into Hive.
  • Wrote multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
  • Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
  • Worked on developing ETL processes (Data Stage open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
  • Responsible for creating Hive tables based on business requirements.
  • Developed Scala and SQL code to extract data from various databases.
  • Worked on regular expression related text-processing using the in-memory computing capabilities of Spark using Scala.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Involved in NoSQL database design, integration and implementation.
  • Loaded data into NoSQL database Hbase.
  • Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
  • Also, explored Spark MLIB library to do POC on recommendation engines.

Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, spark, Kafka, Flume, Storm, Knox, Linux, Scala, Maven, Java Scripting, Oracle 11g/10g, SVN

Confidential, Calabasas, CA

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nine nodes CDH3 Hadoop cluster on CentOS
  • Implemented Apache Crunch library on top of map reduce and spark for data aggregation.
  • Involved in loading data from LINUX file system to HDFS.
  • Worked on installing cluster, commissioning & decommissioning of datanode, name node recovery, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented a script to transmit sysprin information from Oracle toHbase using Sqoop.
  • Implemented best income logic using Pig scripts and UDFs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Responsible to manage data coming from different sources.
  • Involved in loading data from file system to HDFS.
  • Usage of Impala for the high throughput SQL queries.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Cluster coordination services through Zookeeper.
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, CDH3, CentOS

Confidential, San Mateo, CA

Hadoop Developer

Responsibilities:

  • Involved in review of functional and non-functional requirements.
  • Facilitated knowledge transfer sessions.
  • Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows.
  • Experienced in managing and reviewing Hadoop log files.
  • Connected to the external servers through VPN, putty, VNCViewer
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Got good experience with NOSQL database.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
  • This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
  • Designed and implemented Mapreduce-based large-scale parallel relation-learning system
  • Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.

Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, UNIX Shell Scripting.

Confidential

Java Testing & SVM admin

Responsibilities:

  • Developed Map Reduce programs in Java for parsing the raw data and populating staging
  • Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming
  • Used Eclipse 6.0 as IDE for application development.
  • Involved in writing test cases by using set of conditions to test the application
  • Configured Struts framework to implement MVC design patterns
  • Build sql queries for fetching the required columns and data from database.
  • Used Subversion as the version control system
  • Managed the SVN related responsibilities and maintained the versions accordingly.
  • Done SVN check in and check out’s.
  • Used Hibernate for handling database transactions and persisting objects
  • Used AJAX for interactive user operations and client side validations
  • Developed ANT script for compiling and deployment
  • Performed unit testing using Junit
  • Extensively used Log4j for logging the log files

Environment: Java/J2EE, SQL, PL/SQL, JSP, EJB, Struts, SVN, JDBC, XML, XSLT, UML, JUnit, Log4j

Confidential

Java Developer

Responsibilities:

  • Involved in Requirement Analysis, Development and Documentation.
  • Used MVC architecture (Jakarta Struts framework) for Web tier.
  • Participation in developing form-beans and action mappings required for struts implementation and validation framework using struts.
  • Development of front-end screens with JSP Using Eclipse.
  • Involved in Development of Medical Records module.
  • Responsible for development of the functionality using Struts and EJB components.
  • Coding for DAO Objects using JDBC (using DAO pattern)
  • XML and XSDs are used to define data formats.
  • Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
  • Involved in Bug fixing and functionality enhancements.
  • Designed and developed excellent Logging Mechanism for each order process using Log4J.
  • Involved in writing Oracle SQL Queries.
  • Involved in Check-in and Checkout process using CVS.
  • Developed additional functionality in the software as per business requirements.
  • Involved in requirement analysis and complete development of client side code.
  • Followed Sun standard coding and documentation standards.
  • Participation in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
  • Developed software application modules using disciplined software development process.

Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4J, Web logic 7.0, JDBC, MyEclipse, Windows, XP, CVS, Oracle.

We'd love your feedback!