Lead Hadoop Developer Resume
Libertyville, IL
SUMMARY
- 9+ years of extensive Professional IT experience, including 5+ years of Hadoop experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Well experienced in the Hadoop ecosystem components like Hadoop, MapReduce, Cloudera,Horton works, Mahout,HBase, Oozie, Hive, Sqoop, Pig, andFlume.
- Experience in using Automation tools like Chef for installing, configuring and maintaining Hadoop clusters.
- Lead innovation by exploring, investigating, recommending, benchmarking and implementing data centric technologies for the platform.
- Technical leadership role responsible for developing and maintaining data warehouse and Big Data roadmap ensuring Data Architecture aligns to business centric roadmap and analytics capabilities.
- Experienced inHadoopArchitectand Technical Lead role, provide design solutions andHadooparchitectural direction.
- Strong knowledge with Hadoop cluster connectivity and security.
- Demonstrates an ability to clearly understand user’s data needs and identifies how to best meet those needs with the application, database, and reporting resources available.
- Strong understanding of Data Modeling in data warehouse environment such as star schema and snow flake schema.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Great understanding of structure of relational database, which give me advance to write complex SQL statement combine multiple joins and inline-view.
- Proficient in writing Structured Query Language under Microsoft SQL Server and Oracle environment.
- Create, manipulate and interpret reports with specific program data.
- Determine client and practice area needs and customize reporting systems to meet those needs.
- Hands on experience with HDFS, Map-reduce, PIG, Hive, AWS, Zookeeper, Oozie, HUE, Sqoop,Spark, Impala, Accumulo.
- Good experience on general data analytics on distributed computing cluster likeHadoopusing Apache Spark,Impala and Scala.
- Worked on different RDBMS databases like Oracle, SQL Server, and MySQL.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS and transferred large datasets betweenHadoopandRDBMS by implementing SQOOP.
- Good experience of NoSQL databases like MongoDB, Cassandra, and HBase.
- Hands on experience developing applications on HBase and expertise with SQL, PL/SQL database concepts.
- Excellent understanding and knowledge of ETL tools like Informatica.
- Having experience in using Apache Avro to provide both a serialization format for persistent data, and a wire format for communication betweenHadoopnodes.
- Extensive experience in Unix Shell Scripting.
- Expertise inHadoopworkflows scheduling and monitoring using Oozie, Zookeeper.
- Good Knowledge in developing MapReduce programs using Apache Crunch.
- Strong experience as a JavaDeveloperin Web/intranet, Client/Server technologies using Java, J2EE technologies which includes Struts framework, MVC design Patterns, JSP, Servlets, EJB, JDBC,JSLT,spark XML/XLST, JavaScript, AJAX, JMS, JNDI, RDMS, SOAP, Hibernate and custom tag Libraries.
- Supported technical team members for automation, installation and configuration tasks.
- An excellent team player and self-starter with good communication and interpersonal skills and proven abilities to finish tasks before target deadlines.
TECHNICAL SKILLS
Big Data: ApacheHadoop, Cloudera, Hive, Hbase, Sqoop, Flume, Spark,Pig, HDFS and MapReduce, Oozie, Scala, Impala, Cassandra, Zookeeper, Apache Spark, Apache Kafka, Accumulo and Apache STORM.
Databases: Oracle, MySQL, MS SQL Server and MS Access T-SQL, PL/SQL, SSIS, SSRS
Programming Languages: C/C++, Java,python.
Java Technologies: Java, J2EE, JDBC, JSP, Java Servlets, JMS, Junit, Log4j.
IDE Development Tools: Eclipse, Net Beans, My Eclipse, SOAP UI, Ant.
Operating Systems: Windows, Mac, Unix, Linux.
Frameworks: Struts, Hibernate, Spring.
PROFESSIONAL EXPERIENCE
Confidential, Libertyville, IL
Lead Hadoop Developer
Responsibilities:
- Worked onHadoopcluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production.
- Built APIs that will allow customer service representatives to access the data and answer queries.
- Designed changes to transform currentHadoopjobs to HBase.
- Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
- Developed Spark Application by using Scala.
- The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop reports and established self-service reporting model in Cognos for business users.
- Implemented Bucketing and Partitioning using Hive to assist the users with data analysis.
- Used ooziescripts for deployment of the application and perforce as the secure versioning software.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Extracted large volumes of data feed from different data sources, performed transformations and loaded the data into various Targets.
- Develop database management systems for easy access, storage and retrieval of data.
- Perform DB activities such as indexing, performance tuning and backup and restore.
- Used Sqoop to import the data from RDBMS toHadoopDistributed File System (HDFS) and later analyzed the imported data usingHadoopComponents
- Expertise in writingHadoopJobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Expert in creating PIG and Hive UDFs using Java in order to analyze the data efficiently.
- Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
- Implemented AJAX, JSON, and Java script to create interactive web screens.
- Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it inNoSQL databases such as MongoDB.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Support of applications running on Linux machines
- Developed data formatted web applications and deploy the script using HTML5, XHTML, CSS and Client side scripting using JavaScript.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications.
- Used Zookeeper to manage coordination among the clusters.
- Experienced in analyzing Cassandra database and compare it with other open-sourceNoSQL databases to find which one of them better suites the current requirements.
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installingHadoopupdates, operating system, patches and version upgrades when required
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Environment: ApacheHadoop2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Cassandra, Hue, HCatalog, Java.IBM Cognos, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting
Confidential, Seattle, WA
Hadoop Developer
Responsibilities:
- Experienced on adding/installation of new components and removal of them through Ambari.
- Architecture design and implementation of deployment, configuration management, backup, and disaster recovery systems and procedures.
- Responsible for importing log files from various sources into HDFS using Flume.
- Handled Big Data utilizing aHadoopgroup comprising of 40 hubs.
- Performed complex HiveQL queries on Hive tables.
- Actualized Partitioning, Dynamic Partitions, Buckets in HIVE.
- Create technical designs, data models and data migration strategies,create dimensional data models, data marts.
- Design and build maintain logical and physical databases, dimensional data models, ETL layer design and data integration strategies.
- Created final tables in Parquet format.
- Developed PIG scripts for source data validation and transformation.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
- Experience using Talend administration console to promote and schedule jobs.
- Extracted and updated the data into MongoDB using Mongo import and export command line utility interface.
- Involved in unit testing using MR unit for MapReduce jobs.
- Utilized Hive and Pig to create BI reports.
- Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java map-reduce Hive, Pig, and Sqoop.
- Worked with Informatica MDM in creating single view of the data.
Environment: Cloudera,Hadoop, HDFS, Pig, Hive, MapReduce, Java, Flume, Informatica, Oozie, Linux/Unix Shell scripting, Avro, MongoDB, Python, Perl, Java (jdk1.7), Git, Maven,SAPBW,COGNOS, Jenkins.
Confidential
Java Developer
Responsibilities:
- Involved in analysis and gathering requirements and user specifications from business analyst.
- Involved in creating use case, class, sequence, package dependency diagrams using UML.
- Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
- Used JavaScript for certain form validations, submissions and other client side operations.
- Created Stateless Session Beans to communicate with the client. Created Connection Pools and Data Sources.
- Implemented and supported the project through development, Unit testing phase into production environment.
- Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBM DB2.
- Deployed Server-side common utilities for the application and the front-end dynamic web pages using Servlets, JSP and custom tag libraries, JavaScript, HTML/DHTML and CSS.
Environment: Java 5.0, J2EE, JSP, HTML/DHTML, CSS, JavaScript DB2, Windows XP, Struts Framework, Eclipse IDE, Web Logic Server, SQL, PL/SQL.