Big Data / Sr.hadoop Developer Resume
Regina, SK
SUMMARY:
- Senior Hadoop Developer with 5+ years of professional IT experience which includes 4 years of Big Data experience in Ingesting data, Data Modeling, Querying, Processing, Analysis and implementing enterprise level systems spanning Big Data and Data Integration.
- Accomplished in using various Hadoop Ecosystems such as HDFS, Mapreduce, Hive, Impala, Zookeeper, Oozie, Hbase, Sqoop, Pig, Kafka, Spark and Flume for data storage and analysis.
- Experiencied in setting up a secured kafka cluster. Worked on kafka clutser / Topic optimization.
- Experience in real time streaming application using spark streaming with kafka.
- Experienced in Hadoop architecture and teh daemons of Hadoop - Name Node, Data Node, Resource Manager, Node Manager, Task Tracker, Job Tracker, Single node and Multi node Cluster Configurations.
- Superior background on Big Data Integration and Analytics based on Hadoop, Spark, Kafka, Storm and No-SQL databases.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Acquaintance on experience on Partitions, Bucketing in Hive and designed both Managed and External tables in Hive to optimize performance.
- Proficient in using Apache Sqoop to import and export data from HDFS with RDBMS and Hive at other end.
- Hands on experience on NoSQL databases such as Hbase, Cassandra and MongoDB.
- Good at relational databases like Oracle, MySQL and SQL Server.
- Facilitate in using Flume and Kafka to load teh log data from multiple sources into HDFS.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Expertise in creating/managing database objects like Tables, Views, Indexes, Procedures, Triggers and Functions.
- Extensive noledge in creating PL/SQL stored Procedures, packages, functions, cursors etc against Oracle (9i, 10g, 11g), and MySQL server.
- Experienced in DevOps, Build & Release and Configuration Management on Linux platform.
- Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and Mapreduce.
- Successfully analyzed data using HIVEQL, Pig Latin and custom Mapreduce programs in Java.
- Experience in processing data using HIVEQL and Pig Latin and shell script.
- Created PIG and Hive UDFs using Java in order to analyze teh data efficiently.
- Developed pipelines and processing data from various sources and processing them with Hive and Pig.
- Working experience with different scripting technologies like Pig scripting, Python and Shell Scripting.
- Skilled on Spark, used for data transformation for larger data sets.
- Good Knowledge in Web technologies using Core Java.
- Optimized in executing Batch jobs of teh data streams through SPARK Streaming.
- Knowledge on reporting tools like Tableau for analytics on data in cloud.
- Experienced in source control repositories via CVS, SVN and GitHub.
- Responsible in Database development, ETL, OLAP, OLTP.
- Experienced with industry Software development methodologies like Waterfall, Agile within teh software development life cycle.
- Experience working with Agile as well as Waterfall Software Development Life Cycle (SDLC) methodologies.
- Ability to adapt new evolving technlogies, stong.
- Having good noledge in various data interchange and representation formats such as JSON, XML, AVRO and Parquet.
- Strong Experience in handling Web Servers like Tomcat and application servers like Web Logic, WebSphere and JBOSS.
- Extensive experience in Java and J2EE technologies like Servlets, JSP, JDBC, XML and HTML.
- Worked with Big Data distributions like Cloudera (CDH3 and CDH4) with Cloudera Manager, Hortonworks Ambari.
- Good team player, strong interpersonal and communication skills combined with self-motivation, initiative and teh ability to think outside teh box.
TECHNICAL SKILLS:
Big Data: Hadoop Storm, Trident, Hbase, Hive, Flume, CassandraKafka,Storm, Sqoop, Oozie, PIG, SparkMapreduce,Zookeeper, Yarn.
Operating Systems: UNIX, Mac, Linux, Windows 2000 / NT / XP / VistaAndroid
Programming Languages: Java … (JDK 5/JDK 6), C/C++, Mat lab, R, HTML, SQLPL/SQL, Ruby, Core Java.
Frameworks: Hibernate 2.x/3.x, Spring 2.x/3.x,Struts 1.x/2.x and JPA
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, RESTJersey
Databases/technologies: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL4.x/5.x
Middleware Technologies: Web sphere Message Queue, Web sphere MessageBroker, XML gateway, JMS
Web Technologies: J2EE, Soap & REST Web Services, JSP, Servlets, EJBJavaScript, Struts, Spring,Web works, Direct Web remoting, HTML, XML,JMS, JSF, Ajax.
Testing Frameworks: Mockito, PowerMock, EasyMock.
Web/Application: ServersIBM Web sphere Application server, Jboss, ApacheTomcat.
Others Software: Borland Star team, Clear case, Junit, ANTMaven, Android Platform,Microsoft Office, SQL DeveloperDB2 control center,MicrosoftVisio,Hudson, Subversion, GIT, Nexus, Artifactory and Trac.
Developement Strategies: Agile, SDLC, Lean Agile, Pair Programming, Water-Falland Test Driven Development.
PROFESSIONAL EXPERIENCE
Big data / Sr.Hadoop Developer
Confidential, Regina, SK
Responsibilities:
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Maintained teh Hadoop cluster using Knox gateway and apache Ranger. Integrated teh hive warehouse with HBase
- Migrating teh needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Designed and worked on different nifi flows with kafkaConsumer and kafkaProducer processors to do teh transformations and loading into teh downstream systems with kafka as a messaging system.
- Part of teh duties is to support teh kafka related issues and halping teh other application teams to optimize teh flows dat uses teh kafka messaging system.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Developed multiple Kafka Producers and Consumers from base by using low level and high level API's and implementing.
- Maintain System integrity of all sub-components related to Hadoop.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- HiveQL scripts to create, load, and query tables in a Hive.
- Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Strong experience and noledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Supported Map Reduce Programs those are running on teh cluster
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Created and maintained various DevOps related tools for teh team such as provisioning scripts, deployment tools, and staging environments on AWS and Cloud.
- Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
- Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Real time streaming teh data using Spark with Kafka.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
- Generate final reporting data using Tableau for testing by connecting to teh corresponding Hive tables using Hive ODBC connector.
- Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
- Developing and maintaining efficient ETL Talend jobs for Data Ingest.
- Good noledge of all phases of teh Iterative Software Development Life Cycle (SDLC).
- Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
- Modified reports and Talend ETL jobs based on teh feedback from QA testers and Users in development and staging environments.
- Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.
Environment: Hortonworks Hadoop2.3, HDFS, Hive, HQL scripts, Scala, Map Reduce, Storm, Spark, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie Co-ordinator, Kafka, Flume, MySQL, DevOps, Tableau, SDLC, Elastic search, Talend and SFTP.
Hadoop Developer
Confidential, Halifax, Nova Scotia
Responsibilities:
- Created Map Reduce programs in Java for parsing teh crude information and populating arranging tables.
- Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in teh cloud and Experience in build scripts to do continuous integrations systems like Jenkin
- Actualized Bland writable to consolidate various information sources into reducer to execute suggestion based reports utilizing Map Reduce programs
- Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka,Storm and web Methods technologies.
- Involvement in overseeing and inspecting Hadoop log files.
- Created Apache Storm by Core Java to process large data in real time.
- Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming and Spark Data Sets.
- Created end to end development of dashboards including teh ETL application which was written in Ruby and all teh Database schemas for persistence layers.
- Implemented indexing for logs from Oozie to Elastic Search.
- Worked on Linux shell scripts for processes and loading teh data from different systems to teh HDFS.
- Advanced Map Reduce Occupations to utilize HDFS proficiently by utilizing different compression components
- Experience with Horton works distribution of • Setup Hadoop cluster using EC2 (Elastic MapReduce) on managed Hadoop Frame Work.
- Used S3 Bucket to store teh jar's, input datasets and used DynamoDB to store teh processed output from teh input data set
- Utilized S3 Bucket to store teh jar's, input datasets and utilized DynamoDB to store teh processed output from teh input data sets
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Environment: Java, Hadoop, Linux, MapReduce, HDFS, Hive, Shell Scripting, Java (JDK 1.6), Eclipse, SVNJIRA, SOLR, CDH4, Cloudera Manager, Spark, Kafka, Pig, HBase, Flume, MySQL, Storm Sqoop, Oozie, Core Java, Elastic Search, AWS.
Hadoop Developer
Confidential, Montreal, Quebec
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Experience in installing, configuring and using Hadoop Ecosystem components.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Experienced in managing and reviewing Hadoop log files.
- Participated in development/implementation of ClouderaHadoop environment.
- Load and transform large sets of structured, semi structured and unstructured data.
- Experience in working with various kinds of data sources such asHBase and Oracle.
- Successfully loaded files to Hive and HDFS from HBase.
- Installed Oozie workflow engine to run multiple map-reduce programs which run independently with time and data.
- Performed Data scrubbing and processing.
- Responsible for managing data coming from different sources.
- Gained good experience with NOSQL database.
- Experience in working with Flume to load teh log data from multiple sources directly into HDFS.
- Supported Map Reduce Programs those are running on teh cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in Map Reduce way.
- Worked in installing cluster, commissioning & decommissioning of Data Node, Name Node recovery, capacity planning and slots configuration.
- Implemented best income logic using Pig scripts.
- Load and transform large sets of structured, semi structured and unstructured data.
- Exported teh analyzed data to teh relational databases using Sqoopfor visualization and to generate reportsfor teh BI team.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig andSqoop.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Datameer, PIG, Zookeeper, Sqoop, Oozie, HBase, CentOS, SOLR.
Big data / Hadoop Developer
Confidential, Oshawa, ON
Responsibilities:
- Gathered teh business requirements from teh Business Partners and Subject Matter Experts.
- Worked with Data Modeler and DBAs to build teh data model and table structures.
- Actively participated in discussion sessions to design teh ETL job flow.
- Worked with 10+ source systems and got batch files from heterogeneous systems like Unix/windows/oracle/mainframe/db2. Extensively used informatica to load data from wide range of sources such as Oracle, SQL Server Sources, Teradata, and Flat Files to DB2 Database.
- Handled 20 TB of data volume with 10 Node cluster in Test environment.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Used to manage and review teh Hadoop log files.
- Supported Hbase Architecture Design with teh Hadoop Architect team to develop a Database Design in HDFS.
- Supported Map Reduce Programs those are running on teh cluster and also Wrote MapReduce jobs using Java API.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Wrote Hive queries for data analysis to meet teh business requirements. Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming teh data using Spark with Kafka for faster processing.
- Configured Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Scala.
- Written python scripts for internal testing which pushes teh data reading form a file into Kafka queue which in turn is consumed by teh Storm application.
- Worked on Kafka, Kafka-Mirroring to ensure dat teh data is replicated without any loss.
- Utilized Agile Scrum Methodology to halp manage and organize a team of 4 developers with regular code review sessions.
- Participated in building CDH4 test cluster for implementing Kerberos autantication. Upgraded teh Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate teh HIVE with existing applications
- Analyzed teh data by performing Hive queries and running Pig scripts to no user behavior.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Deploying various Topologies into teh Storm cluster based on teh business use cases.
- Prototype done with HDP Kafka and Storm for click stream application.
- Updated maps, sessions and workflows as a part of ETL change and also modified existing ETL Code and document teh changes.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Familiar with Scala, closures, higher order functions, monads.
Environment: Hadoop, Java, MapReduce, HDFS, Hbase, Hive, Pig, Linux, XML, Eclipse, Kafka, Storm, Spark, Cloudera, CDH4/5 Distribution, DB2, SQL Server, Oracle 11i, MySQL, InformaticaPowerCenter 8.1, Informatica Power Connect.
