We provide IT Staff Augmentation Services!

Hadoop/big Datadeveloper Resume

San Jose, CA

SUMMARY:

  • Around 8years of IT experience which includes involvement in creating, actualizing and configuring Hadoop ecosystem and expertise in delivering solutions to Network Optimization.
  • Having6 years of experience as a Hadoop developer with extensive knowledge on Hive, Pig, Sqoop, Flume, Spark, PySpark,Scala, HBase, Oozie, ZooKeeper, impalaandCassandra.
  • Experience in developing custom Map - Reduce & YARN programs in java and python to process huge data as per the requirement.
  • Extensive knowledge in importing and exporting data using Sqoop from RDBMS (Relational Data Base Systems) to HDFS.
  • Worked on Google Cloud Platform Services like Vision API, Instances.
  • Worked on various file formats like Json, CSV, Avro, Sequence file, Text files and XML files.
  • Good knowledge on No-SQL databases like HBase, Cassandra and MongoDB.
  • Experience in administration of clusters using Ambari and Cloudera.
  • Experience in Operational Intelligence using Splunk.
  • Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups, AWS CLI.
  • Experience in breaking down information utilizing Pig Latin Scripts and Hive Query Language, Experience in Apache NIFI which is a Hadoop technology and also Integrating Apache NIFIand ApacheKafka.
  • Worked on backend application projects related to ETL, data migration and Developed shell scripts for automation on DBA tasks.
  • Responsible for Designing Logical and Physical data modelling for various data sources on Amazon Redshift. Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
  • Ability to synthesize high-tech process with intense Conceptual, Business and Analytical skills to present capacity solutions and result-oriented analytic performance and command skills.

TECHNICAL SKILLS:

Hadoop Technologies: MapReduce, HDFS, YARN, Hive, Pig, Sqoop, Flume, Zookeeper, HBase, Spark, Kafka, Impala, Oozie

Programming Languages: Java, Python, SQL, PL/SQL, Shell Scripting, C, UNIX Shell Scripting, HTML

Frameworks: Hibernate 2.x/3.x, spring 2.x/3.x, Struts 1.x/2.x

Database Systems: Oracle, MySQL, Postgress, Teradata, HBase, Cassandra, Mongo DB

Web Technologies: Web Logic, Web Sphere, HTML5, CSS, JavaScript, JQuery, AJAX, Servlets, JSP,JSON, XML, XHTML, SOAP and Rest Web Services

IDE Tools: Eclipse, NetBeans, RAD

Visualization Tools: Tableau

Operating Systems: Windows XP, 7, 10, Linux, Unix

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Hadoop/Big DataDeveloper

Responsibilities:

  • Performed joins, group by and other operations in Hive.
  • Wrote and executed PIG scripts using Grunt shell.
  • WrotePy Spark scripts for processing huge data sets using python.
  • Extensively worked on data frames for processing large data manipulations.
  • Used Rest API to Access HBase data to perform analytics.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations previously storing the data into HDFS.
  • Worked on Google Vision API, Created Vision application in python for detecting information from Confidential ’s internal data (images, V Cards etc).
  • Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases
  • Experienced in analyzing Cassandra database and correlate it with other open-source NoSQL databases to find which one of them better suites the stream requirements.
  • Exported the analyzed data to the relational databases employing Sqoop for visualization and to generate reports for the BI team.
  • Worked on log parsing and created well-structured search queries in order to minimize the performance issues.
  • Worked on the sending data to Splunk Enterprise using Http Event Collector(HEC).
  • Worked on Oozie workflow engine for job scheduling.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Effectively participate the team in achieving the big data tasks, delivering the projects in time and learned the optimal way to process any kind of tasks.

Environment: Apache Hadoop 2.2.0, Cloudera, Hue, MapReduce, Hive, HBase, HDFS, Cassandra, PIG, Sqoop, Oozie, PySpark, UNIX,Splunk 6.6.4, Google Vision API, Python.

Confidential, Kent, WA

Hadoop/Spark Developer

Responsibilities:

  • Developing distributed computing Big Data applications using Open Source frameworks like Apache Apache Spark, Apex, Flink, Storm, NIFI and Kafka.
  • Developed Spark scripts by using Scala shell commands and Java as per the requirement.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS. Big Data tool to load the big volume of source files from S3 to Redshift..
  • Used Rest API to Access HBase data to perform analytics.
  • Design and Implementation of ETL process in Hadoop.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations previously storing the data into HDFS.
  • Created UDF’s to encode the client sensitive data and stored into HDFS and performed evaluation employing PIG.
  • Worked with Nifi for managing the flow of data from source to HDFS
  • Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
  • Designed the web-based structure for business analytics and data visualization in Hadoop ecosystem integrated Tableau on Hadoop frame work to visualize and analyze data.
  • Experienced in analyzing Cassandra database and correlate it with other open-source NoSQL databases to find which one of them better suites the stream requirements.
  • Exported the analyzed data to the relational databases employing Sqoop for visualization and to generate reports for the BI team.
  • Create, modify and execute DDL and ETL scripts for De-normalized tables to load data into Hive and AWS Redshift tables.
  • Worked on Oozie workflow engine for job scheduling.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Effectively participate the team in achieving the big data tasks, delivering the projects in time and learned the optimal way to process any kind of tasks.

Environment: Apache Hadoop 2.2.0, HDP2.2, Ambari, MapReduce, Hive, Java, HBase, HDFS, Cassandra, NiFiAWS, PIG, Sqoop, Oozie, Java 1.7, UNIX, Shell Scripting, XML.

Confidential, Houston, TX

Hadoop Developer

Responsibilities:

  • Provide technical designs, architecture, Support automation, installation and configuration tasks and upgrades and planning system upgrades of Hadoop cluster.
  • Developed data pipeline using Flume and Sqoop to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Did comparative analysis of the Hive vs Impala.
  • Maintained Hadoop clusters for dev/staging/production. Trained the development, administration, testing and analysis teams on Hadoop framework and Hadoop eco system.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReducein Java, Pig and Hive programs.
  • Successfully integrated Hive tables and MongoDB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using ApacheFlume and stored the data into HDFS for analysis.
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Computed various metrics using Java Map Reduce to calculate metrics that define user experience, revenue etc.
  • Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
  • Define business and technical requirements, design Proof of Concept for evaluatingafmsagencies data evaluation criteria and scoring and select data integration and information management.
  • Integrating Big data technologies and analysis tools into the overall architecture.

Environment: Hadoop, Cassandra, HBase, HDFS, MapReduce, Hive, Pig, Java, Sqoop, Flume, Oozie, Java, JSP, RMI, JNDI, JDBC, Tomcat, Apache, Shell Scripting.

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop. Worked hands on with ETL process using Pig.
  • Worked on data analysis in HDFS using MapReduce, Hive and PIG jobs.
  • Worked on MapReduce programming and Hbase.
  • Involved in creating external table, partitioning, bucketing of table.
  • Ensuring adherence to guidelines and standards in project process.
  • Facilitating testing in different dimensions.
  • Used Crontab for automation of scripts.
  • Wrote and modified stored procedures to load and modifying of data according to business rule changes.
  • Worked on production support environment.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through ClouderaManager.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Kerberos security was implemented to safeguard the cluster.
  • Worked on a stand-alone as well as a distributed Hadoop application.

Environment: Apache Hadoop, Cloudera, Pig, Hive, SQOOP, Flume, Java/J2EE, Oracle 11G, Crontab, JBoss 5.1.0Application Server, Linux OS, Windows OS, AWS.

Confidential

Software Engineer

Responsibilities:

  • Analysis of the specifications provided by the clients
  • Preparing and changing Documents (Technical and UI).
  • Designed and developed business components and front end using JSP and Servlets.
  • Implemented Struts framework in the presentation tier for all the essential control flow, business level validations and for communicating with the business layer.
  • Coding using HTML pages, Struts, JSP.
  • Front end validation using Javascript
  • Developed required PL/SQL scripts.
  • Involved in testing, debugging, bugs fixing and documentation of the system.
  • Testing - unit testing and performance testing.
  • Used CVS as the Configuration Management and Version Control.

Environment: HTML, CSS, Apache Tomcat server, JAVA, JSP, Servlets, Javascript, JDBC, TOAD, Eclipse, ANT, CVS, UNIX.

Hire Now