Hadoop (big Data) Developer/admin Resume
Minneapolis, MN
SUMMARY
- 8+ years of professional experience in Software Development and Requirement Analysis in Agile work environment with 4+ years of Big Data Ecosystems experience in ingestion, storage, querying, processing and analysis of Big Data.
- Experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Oozie, Mahout, Python, Spark, Storm, Cassandra, MongoDB, Big Data and Big Data Analytics.
- Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondry Namenode, and MapReduce concepts.
- Experienced managing No - SQL DB on large Hadoop distribution Systems such as: Cloudera, Hortonworks HDP, MapR M series etc.
- Experienced developing Hadoop integration for data ingestion, data mapping and data process capabilities.
- Experienced in building analytics for structured and unstructured data and managing large data ingestion using technologies like Kafka/Avro/Thift.
- Worked with various data sources such as Flat files and RDBMS-Teradata, SQL server 2005, Netezza and Oracle. Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion.
- Exceptional ability to quickly master new concepts and capable of working in groups as well as independently.
- Has good knowledge of virtualization and worked on VMware Virtual Center.
- Excellent working knowledge of different statistical analysis tools like SPSS and Microsoft Excel.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
- Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
- Experience in ETL (Datastage ) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
- Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the Confidential data warehouse.
- Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
- Proficiency in programming with different IDE's like Eclipse, NetBeans.
- Involved in database design, creating Tables, Views, Stored Procedures, Functions, Triggers and Indexes.
- Good understanding of service oriented architecture (SOA) and web services like XML, XSD, XSDL, SOAP.
- Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS cloud services: EC2, Cloud Formation, VPC, S3, etc.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- In-depth understanding of Data Structure and Algorithms.
- Experience in managing and troubleshooting Hadoop related issues.
- Expertise in setting up standards and processes for Hadoop based application design and implementation.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in managing Hadoop clusters using Cloudera Manager.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Extensive experience working in Oracle, Netezza, DB2, SQL Server and My SQL database.
- Hands on experience in VPN, Putty, winSCP, VNCviewer, etc.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
TECHNICAL SKILLS
Hadoop ECO Systems: HDFS, Map Reducing, HDFS, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and HBase, Cassandra
NO SQL: HBase, Cassandra, MongoDB
Data Bases: MS SQL Server 2000/2005/2008/2012 , MY SQL, Oracle 9i/10g
Languages: Languages Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, PL/SQL.
Operating Systems: Windows Server 2000/2003/2008 , Windows XP/Vista, Mac OS, UNIX, LINUX
Java Technologies: Servlets, JavaBeans, JDBC, JNDI
Frame Works: JUnit and JTest
IDE’s & Utilities: Eclipse, Maven, NetBeans.
SQL Server Tools: SQL Server Management Studio, Enterprise Manager, QueryAnalyser, Profiler, Export & Import (DTS).
WebDev. Technologies: ASP.NET, HTML,XML
PROFESSIONAL EXPERIENCE
Confidential - Minneapolis, MN
Hadoop (Big Data) Developer/Admin
Responsibilities:
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper.
- Used Sqoop to transfer data between RDBMS and HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Implemented complex map reduce programs to perform map side joins using distributed cache.
- Designed and implemented custom writable, custom input formats, custom partitions and custom comparators in Mapreduce.
- Thoroughly tested Mapreduce programs using MRUnit and Junit testing frameworks.
- Responsible for troubleshooting issues in the execution of Mapreduce jobs by inspecting and reviewing log files.
- Converted existing SQL queries into Hive QL queries.
- Implemented UDFs, UDAFs, UDTFs in java for hive to process the data that can’t be performed using Hive inbuilt functions.
- Effectively used Oozie to develop automatic workflows of Sqoop, Mapreduce and Hive jobs.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Loaded and analyzed Omniture logs generated by different web applications.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Refined the Website clickstream data from Omniture logs and moved it into Hive.
- Wrote multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Worked on developing ETL processes (Data Stage open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
- Responsible for creating Hive tables based on business requirements.
- Developed Scala and SQL code to extract data from various databases.
- Worked on regular expression related text-processing using the in-memory computing capabilities of Spark using Scala.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Involved in NoSQL database design, integration and implementation.
- Loaded data into NoSQL database Hbase.
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Also, explored Spark MLIB library to do POC on recommendation engines.
Environment: Hadoop, CDH4, Map Reduce, HDFS, Pig, Hive, Impala, Oozie, Java, spark, Kafka, Flume, Storm, Knox, Linux, Scala, Maven, Java Scripting, Oracle 11g/10g, SVN
Confidential, Calabasas, CA
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nine nodes CDH3 Hadoop cluster on CentOS
- Implemented Apache Crunch library on top of map reduce and spark for data aggregation.
- Involved in loading data from LINUX file system to HDFS.
- Worked on installing cluster, commissioning & decommissioning of datanode, name node recovery, capacity planning, and slots configuration.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented a script to transmit sysprin information from Oracle toHbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources.
- Involved in loading data from file system to HDFS.
- Usage of Impala for the high throughput SQL queries.
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, CDH3, CentOS
Confidential, San Mateo, CA
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Facilitated knowledge transfer sessions.
- Installed and configured Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Experienced in managing and reviewing Hadoop log files.
- Connected to the external servers through VPN, putty, VNCViewer
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Got good experience with NOSQL database.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- This plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system
- Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Setup Hadoop cluster on Amazon EC2 using whirr for POC.
Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, UNIX Shell Scripting.
Confidential
Java Testing & SVM admin
Responsibilities:
- Developed Map Reduce programs in Java for parsing the raw data and populating staging
- Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming
- Used Eclipse 6.0 as IDE for application development.
- Involved in writing test cases by using set of conditions to test the application
- Configured Struts framework to implement MVC design patterns
- Build sql queries for fetching the required columns and data from database.
- Used Subversion as the version control system
- Managed the SVN related responsibilities and maintained the versions accordingly.
- Done SVN check in and check out’s.
- Used Hibernate for handling database transactions and persisting objects
- Used AJAX for interactive user operations and client side validations
- Developed ANT script for compiling and deployment
- Performed unit testing using Junit
- Extensively used Log4j for logging the log files
Environment: Java/J2EE, SQL, PL/SQL, JSP, EJB, Struts, SVN, JDBC, XML, XSLT, UML, JUnit, Log4j
Confidential
Java Developer
Responsibilities:
- Involved in Requirement Analysis, Development and Documentation.
- Used MVC architecture (Jakarta Struts framework) for Web tier.
- Participation in developing form-beans and action mappings required for struts implementation and validation framework using struts.
- Development of front-end screens with JSP Using Eclipse.
- Involved in Development of Medical Records module.
- Responsible for development of the functionality using Struts and EJB components.
- Coding for DAO Objects using JDBC (using DAO pattern)
- XML and XSDs are used to define data formats.
- Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
- Involved in Bug fixing and functionality enhancements.
- Designed and developed excellent Logging Mechanism for each order process using Log4J.
- Involved in writing Oracle SQL Queries.
- Involved in Check-in and Checkout process using CVS.
- Developed additional functionality in the software as per business requirements.
- Involved in requirement analysis and complete development of client side code.
- Followed Sun standard coding and documentation standards.
- Participation in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
- Developed software application modules using disciplined software development process.
Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4J, Web logic 7.0, JDBC, MyEclipse, Windows, XP, CVS, Oracle.
