Hadoop Developer Resume
OklahomA
PROFESSIONAL SUMMARY:
- Overall 7+ years of progressive IT experience in Analysis, Design, Development, Implementation and Testing of software applications using Big Data Technologies and Java based technologies. Complete life cycle (SDLC) experience of a product involved in System Analysis, Architecture, Technical design, development, testing, deployment & support medium to large - scale business applications using Agile Scrum and iterative development methodologies.
- Experience in design and deployment of Enterprise Application Development, Web Applications, Client-Server Technologies, Web Programming using Java and Big data technologies.
- Experience in Installation and Configuring Hadoop Stack elements MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Oozie and ZooKeeper.
- Expertise in Database design, creation and management of schemas, writing of stored procedures, Functions, DDL, DML, SQL queries.
- Excellent understanding of distributed storage systems like HDFS and batch processing systems like MapReduce and Yarn.
- Good hands-on experience on all the major distributions of Hadoop like MapR, Hortonworks, Cloudera.
- Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue).
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Experience in designing Hive schemas, using performance tuning techniques like partitioning, bucketing etc.
- Experience in importing and exporting data using Sqoop from Relation Database Systems (RDMS) to HDFS and vice versa.
- Experience in scheduling Jobs thru Oozie, and Zena.
- Experience in writing Ad-hoc queries for moving data between HDFS and Hive and analysing the data using HiveQL.
- Experience in writing and Extending Map Reduce User Defined Functions (UDF’s) in Java, Hive and Pig.
- Experience in fetching data into Hadoop Data Lake from various databases like SQL Server, DB2, Oracle and Teradata using Sqoop.
- Designed, configured and deployed Amazon Web Services (AWS) for a multitude of applications utilizing the AWS stack (Including EC2, Glue, Data pipeline EMR, SNS, S3, RDS, Cloud Watch, SQS, IAM), focusing on high-availability, fault tolerance, and auto-scaling.
- Worked with AWS Glue jobs to transform data to a format that optimizes query performance for Athena.
- Experienced in working with different file formats like Parquet, RC Files, ORC Files, Sequence Files, Text files, XML, JSON and Avro etc.
- Experience in loading messages from JMS Queue and logs from multiple sources into HDFS using Flume.
- Experience on Talend data fabric on Big data and data integration.
- Experience in processing of real-time data using SparkSQL and Scala.
- Good knowledge of integrating Spark Streaming with Kafka for real time processing of streaming data.
- Performed Real time event processing of data from multiple servers in the organization using Apache Storm by integrating with apache Kafka.
- Good Hands on knowledge in Statistical tool IBM SPSS.
- Migrated Hadoop cluster on AWS and defined different read/write strategies.
- Extensive experience working on Oracle, DB2, SQL Server, PL/SQL and My SQL databases.
- Extensive knowledge on Designing, Managing and Morning the workflows by Informatica Power Centre Tool.
- Experience in writing scripts using UNIX Shell script and SCALA REPL.
- Proficient in working with NoSQL databases like HBase.
- Experience in using Kerberos for authenticating the end users using Hadoop in a secure mode.
- Experience in creating latency dashboards by using Splunk.
- Experience in using application serves like JBoss and Web Logic.
- Involved in unit testing of Map Reduce programs using MRUnit and Junit.
- Experience in working with build management tools like Maven and Ant and Integration tools like Jenkins.
- Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Waterfall and Agile Methodologies.
- Good knowledge in Finance, Health Care Domains.
TECHNICAL SKILLS:
Programming Languages: Scala, UNIX Shell Scripting.
Hadoop / Big Data Stack: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Unix, Scala, Kafka, Oozie, ZooKeeper, Zena, Spark, Spark-SQL, Sqoop, Flume, Storm, Flink, HBase, Splunk, Talend, CDC.
Hadoop Distributions: Cloudera, Horton Works, MapR
AWS Tools: AWS EC2, S3, CloudFormation, EMR, CodePipeline, Glue, Kinesis, DynamoDB, RDS, Redshift.
Databases: Oracle, MySQL, DB2, Teradata, SQL Server, Sybase.
No SQL Databases: HBase.
Query Languages: HiveQL, SQL
IDE’s: Eclipse, NetBeans, IntelliJ.
Build& Integration Tools: Maven, Jenkins.
Operating Systems: Windows, Linux, Unix and CentOS.
Version Control Systems: SVN, Git, CVS.
PROFESSIONAL EXPERIENCE:
Confidential, Oklahoma
Hadoop Developer
Responsibilities:
- Developing business logic using Scala.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Implemented File Transfer Protocol operations using Talend Studio to transfer files in between network folders.
- Creating Hive UDFs in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Develop Hive queries for the analysts.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Configured big data workflows to run on the top of Hadoop using Oozie and these workflows comprises of heterogeneous jobs like Pig, Hive, Sqoop Cluster co-ordination services through Zookeeper.
- Load and transform data into HDFS from large set of structured data /Oracle/Sql server using Talend Big data studio.
- Hands on experience on cloud services like Amazon web services (AWS)
- Worked on migration of Oozie workflows into Apache Airflow DAGs.
- Used Airflow workflows to automate jobs on Amazon EMR.
- Worked with PySpark, improving the performance and optimized of the existing applications running on EMR cluster to Glue AWS.
- Wrote AWS Glue scripts to extract, transform load the data.
- Worked with parquet data format for faster transforming data to optimize query performance for Athena.
- Worked on bitbucket, git and bamboo to deploy EMR clusters.
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC and PARQUET for HIVE querying and processing.
- Worked on different open source J2EE technologies like spring core, spring JDBC, spring data and spring boot.
- Consumed the data from Kafka queue using spark.
- Managed and reviewed Hadoop log files.
- Integrated Hive tables to HBase to perform row level analytics.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Analysing the metadata as per the requirements.
- Loaded the data in HDFS from local system.
- Writing the pig scripts for processing the data.
- Created hive table with schema and loaded the data using Sqoop.
- Writing queries in hive to map the data resided in HDFS.
- Written java code for Map Reduce job.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Pig, Oozie and Talend, AWS, Spark, Scala.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Zookeeper, SQOOP, flume, Spark, Kafka, Hbase with MapR Distribution.
- Involved in importing the data from various formats like MapR-DB JSON, XML to HDFS environment. Involved in transfer of data from post log tables into HDFS and Hive using SQOOP
- Worked with NoSQL databases like HBase to create tables and store the data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Wrote Pig scripts to store the data into HBase
- Created Hive tables, dynamic partitions, buckets for sampling, and worked on them using Hive QL
- Stored the data in tabular formats using Hive tables and Hive Serde.
- Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team. Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
- Designed the solution using Storm Spouts(to stream data from Kafka) and Bolts connecting to Java APIs developed independently based on the application logic.
- Ingested streaming data with Apache NiFi into Kafka.
- Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3)
- Worked with Nifi for managing the flow of data from sources through automated data flow.
- Involved in Installing Hadoop Ecosystem components.
- Setup Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning and performance tuning.
- Written Complex Map reduce programs.
- Involved in HDFS maintenance and administering it through Hadoop-Java API
- Configured Fair Scheduler to provide service level agreements for multiple users of a cluster
- Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop.
- Involved in writing Java API’s for interacting with HBase
- Involved in writing Flume and Hive scripts to extract, transform and load data into Database
- Used HBase as the data storage
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
- Experienced in installing, configuring and using Hadoop Ecosystem components.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Participated in development/implementation of Cloudera Hadoop environment.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.
- Developed and delivered quality services on-time and on-budget. Solutions developed by the team use Java, XML, HTTP, SOAP, Hadoop, Pig and other web technologies.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: Linux Ubuntu 10.10/14.04/16.04, NiFi, MAP-R, HBase, Hive, Sqoop, Spark, Kafka, Flume, Pig, AWS.
Confidential
Java Developer
Responsibilities:
- Followed MVC architecture to develop the web applications.
- Developed a friendly user interface which lets the users who have knowledge about Basic English to interact with the system.
- Involved at requirement gathering & Analysis of the project.
- Designed service design document which contains UML use case diagrams, class diagrams, sequence diagrams and activity diagrams.
- Developed various Java classes, SQL queries and procedures to retrieve and manipulate the data from backend Oracle database using JDBC.
- Analysis of business requirements and develop system architecture document for the enhancement project.
- Development of database interaction code to JDBC API making extensive use of SQL
- Query Statements and advanced Prepared Statements.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product
Environment: Windows, Java, UNIX, Oracle using Toad.
Confidential
Linux Administrator
Responsibilities:
- Installation, Configuration, Upgradation and administration of Sun Solaris, RedHat Linux.
- User account management and support.
- Jumpstart & Kick-start OS integration, DDNS, DHCP, SMTP, Samba, NFS, FTP, SSH, LDAP integration.
- Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
- Responsible for configuring and managing Squid server in Linux.
- Configuration and Administration of NIS environment.
- Managing file systems and disk management using Solstice Disk suite.
- Involved in Installing and configuring of NFS.
- Worked on Solaris volume manager to create file systems as per user and database requirements.
- Trouble shooting the system and end user issues.
- Responsible for configuring real time backup of web servers.
- Log file was managed for troubleshooting and probable errors.
- Responsible for reviewing all open tickets, resolve and close any existing tickets.
- Document solutions for any issues that have not been discovered previously.
Environment: Sun Solaris 2.6/7, SUN Ultra Enterprise 6000/450, SUN Ultra 10/5/2/1, Windows NT 4.0, RHEL 4.x.
