Big Data Hadoop Developer Resume
TX
SUMMARY:
- Over 5 years of total Software development experience with Hadoop Ecosystem, Big Data and Data Science Analytical Platforms, Java/J2EE Technologies, Database Management Systems and Enterprise - level Cloud Base Computing and Applications.
- Around 4 years of experience in Design and Implementation of Big data applications using Hadoop stack MapReduce,Spark,Scala Hive, Pig, Oozie, Sqoop, Flume, HBase and NoSQL Data bases.
- Hands on experience in writing complex Map reduce jobs, Pig Scripts and Hive data modeling.
- Have experience creating batch style distributed computing applications using Apache Spark and Flume.
- Have hands-on experience doing analytics using SPARK SQL.
- Hands-on experience and in depth understanding and usage ofHadoopArchitecture frameworks and various components
- Experience and in-depth understanding of analyzing data using HIVEQL, PIG.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs) and Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Good experience in data retrieving and processing using HIVE and P
- Involved in data transfer HDFS to RDBMS using SQOOP and vice-versa.
- Great experience in developing MapReduce programs using Apache cloud era distribution.
- Working experience with Linux lineup like Red hat and CentOS.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Employed code using Python to retrieve and manipulate data. Hands-on experience in Azure Cloud Services (PaaS & IaaS), Storage, Web Apps, Active Directory, Application Insights, and Logic Apps.
- Experience of working with Azure Monitoring, Data Factory, Traffic Manager, Service Bus, Key Vault.
- Designed and developed Cloud Service projects and deployed to Web Apps, PaaS, and IaaS
- Also, familiar with usage of NoSQL technologies like Mongo dB for data withdrawal and loading huge volume of data.
- Extracted and updated the data into MONGODB using MONGO import and export command line utility interface.
- Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
- Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
- Extensive knowledge of utilizing cloud-based technologies using Amazon Web Services (AWS), VPC, EC2, Route S3, Dynamo DB, Elastic Cache Glacier, RRS, Cloud Watch, Cloud Front, Kinesis, Redshift, SQS, SNS, RDS.
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
TECHNICAL SKILLS:
Big Data (Hadoop Framework): Cloudera Distribution, HDFS, Map Reduce, Yarn, PIG, Hive, Flume, Oozie, Zookeeper, Base, SQOOP, Spark, Scala, Kafka, Storm, Apache Phoenix, Data Node, Name Node, Resource Manager
Databases: MySQL, Oracle (SQL, PL/SQL), IBM DB2, MS Access
NoSql Databases: Base, Mongo DB 3.0.1, Cassandra
Languages: SQL, JAVA, J2EE, PYTHON, Pig Scripting, C
Scripting Languages: JavaScript, JSP 2.0/1.2, JQuery, JSON, HTML 5, Linux & Unix scripts, XML
ETL: Talend ETL, Talend Studio
Web Technologies: JSP, Servlets, JavaBeans, JDBC, AWT, Swing, JSF, XML, AWS, AJAX, SOAP, XSLT
IDE: Eclipse, NetBeans, IBM RAD
XML Technologies: XML, XSLT
Operating Systems: Windows XP, 2007 Professional, 7, 8 & 10, UNIX, Linux, CentOS, Ubuntu, RedHat LINUX
PROFESSIONAL EXPERIENCE:
Confidential, San Antonio, TX
Big Data Hadoop Developer
Responsibilities:
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Developing UDFs in java for hive and pig.
- Incubate, administer and operate on Apache NiFi
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Configured SQL Server Master Data Services (MDS) in Windows Azure IaaS.
- Working Experience on Azure Storage, SQL Azure and also in different PaaS Solutions with Web, and worker Roles and Azure Web Apps.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.
- Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
- Experience in building Real-time Data Pipelines with Kafka Connect and Spark Streaming.
- Used Kafka and Kafka brokers, initiated the spark context and processed live streaming information with RDD and Used Kafka to load data into HDFS and NoSQL databases.
- Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka for persisting data into Cassandra.
- Involved in Cassandra Cluster planning and had good understanding in Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
- Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis, modified Cassandra. yaml and Cassandra-env.sh files to set various configuration properties.
- Used Sqoop to import the data on to Cassandra tables from different relational databases like Oracle, MySQL and Designed Column families in Cassandra performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Responsible in development of Spark Cassandra connector to load data from flat file to Cassandra for analysis.
- Worked on Hortonworks-HDP 2.5 distribution.
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Hortonworks.
- Used Hortonworks Apache Falcon for data management and pipeline process in the Hadoop cluster.
Environment: HDP 2.3.4, Hadoop, Hive, HDFS, Spark, Spark-SQL, Spark-Streaming, Scala, KAFKA, AWS, Cassandra, Hortonworks, ELK, Java and Agile Methodologies
Confidential, Oak Brook IL
Big Data Hadoop Developer
Responsibilities:
- Developed a data pipeline using Flume & Sqoop to extract the data from weblogs and store in HDFS.
- Written Hive UDFs along with installing and configuring HIVE.
- Involved in writing Hive queries to meet the business requirements of the clients.
- Transmission of processed data from HDFS to RDBMS or any other external file systems was carried out using Sqoop.
- Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
- Developed custom directives and Services in AngularJS.
- Developed entire frontend and backend modules using Python on DjangoWeb Framework.
- Involved in data modelling and data warehousing by using tools like ETL along with Teradata.
- Developed Pig Scripts for data change notification and delta record processing between the recently arrived data and the previously prevailing data in HDFS.
- Involved in Installing, Organizing Hadoopecosystem, and Cloudera Manager using CDH3 Distribution. using Puppet, Python and AWSCloud Formation Templates.
- Was involved in loading and transmitting data into HDFS and Hive using Sqoop and Kafka.
- Provided support to MapReduce programs those are running on the cluster.
- Custom Kafkabroker design to reduce message retention from default 7-day retention to 30 minute retention - architected a light weight Kafkabroker
Environment: Hadoop, MapReduce, MongoDB, Yarn, Hive, Pig, Base, Kafka, Oozie, Sqoop, Ab Initio, Flume, Core Java, Cloudera, Django, Talend, Impala, Python, HDFS, Eclipse.
Confidential
Big Data Hadoop Developer
Responsibilities:
- Converting the existing relational database model toHadoopecosystem.
- Generate datasets and load toHADOOP Ecosystem
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Worked with Spark to create structured data from the pool of unstructured data received.
- Managed and reviewed Hadoop and HBase log files.
- Loaded the CDRs from relational DB using Sqoop and other sources toHadoop cluster by using Flume.
- Creating Hive tables and working on them using Hive QL.
- Wrote Spark code to convert unstructured data to structured data.
- Design and implement Spark jobs to support distributed data processing.
- Supported the existing MapReduce Programs those are running on the cluster.
- Wrote the shell scripts to monitor the health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved inHadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Followed agile methodology for the entire project.
- Developed Pig Latin scripts to extract and filter relevant data from the web server output files to load into HDFS
- Installed and configured Apache Hadoop, Hive and Pig environment
- Prepare technical design documents, detailed design documents.
Environment: Linux - Ubuntu,Hadoop pseudo distributed mode 1.2.1, HDFS, Hive 0.12, Flume, Hortonworks, Spark,Scala Flume, Hive.
Confidential
Big Data Hadoop Consultant
Responsibilities:
- Responsible for launching and Setup of Hadoop/HBase Cluster, which includes configuring different components of Hadoop and HBase Cluster.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Experienced in loading data from the UNIX file system to HDFS.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios.
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Created tables, partitions, bucket and perform analytics using Hive ad-hoc queries.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Managed real-time data processing and real time Data Ingestion in HBase and Hive using Storm.
- Managed and scheduled Jobs on a Hadoop cluster.
- Performed tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Involved in defining job flows, managing, and reviewing log files.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Hbase, Sqoop, Oozie, Unix.
