Hadoop/spark Developer Resume
SUMMARY
- I have about 7+ years of professional IT experience which includes experience in Big d Confidential ecosystem experience in complete project life cycle (design, development, testing and implementation) of which over 3+ years of work experience in ingestion, storage, querying, processing and analysis of Big D Confidential with hands on experience inHadoop Ecosystem (YARN, HDFS) and its components Hive, Pig, HBase, Sqoop, Hue, Kafka, Flume, Oozie, Zookeeper, Spark, SparkSQL andSparkStreaming.
- Worked Hands on in Hadoop clusters like Hortonworks, AWS Elastic Map Reduce and Cloudera.
- I have hands on experience in improving the performance and optimization of the existing algorithms in Hadoop usingSparkcontext,Spark - SQL, D Confidential Frame, pair RDD's &SparkYARN.
- I have working experience on building spark applications using build tools like SBT, Maven and Gradle.
- I have good experience in dealing with different file formats like text, Sequence, RCFILE, ORC, Parquet, Avro and JSON and different compression formats like GZip, LZO, BZip2 and snappy.
- I have good knowledge on relational d Confidential bases like MySQL, Oracle and NoSQL d Confidential bases like HBase, MongoDB. working knowledge on UNIX /Linux systems including Experience on shell scripting working experience in handling semi/un-structured d Confidential from different d Confidential sources.
- Working experience in developing Map Reduce programs using Combiners, Map side join, Reducer side join, Distributed Cache, Compression techniques, Multiple Input & output.
- I have working experience in performing ad-hoc analysis on structured d Confidential using HiveQL, joins and Hive UDF's good exposure to Counters, Shuffle & Sort parameters, Dynamic Partitions, Bucketing for performance improvement.
- I have worked in using IDE like Eclipse and Intellij IDEA
- I have working knowledge in Java and SQL in application development and deployment.
TECHNICAL SKILLS
Big D Confidential Associated: HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Oozie, Apache Spark, Spark SQL, Spark Streaming.
Process/D Confidential Modeling: MS Visio, UML Diagrams and ER Studio
Cluster Manager Tools: HDP Ambari, Cloudera Manager, Hue
ETL/ELT/D Confidential bases: HBase, MongoDB, Spark SQL, MS Access, Oracle, DB-II, My SQL, SQL Developer, SQL Server and Toad
Languages: C, C++, Java, PL/SQL, Python, Scala
Web-Technologies: HTML, DHTML, XML, CSS
Microsoft Technologies: ASP.NET, C#.Net, VB.Net, ADO.NET, SharePoint, Word, Excel and PowerPoint.
Operating Systems: Linux, Ubuntu, RHEL, Windows XP/7/8/10.
IDE: Eclipse and Intellij IDEA
PROFESSIONAL EXPERIENCE
Confidential
Hadoop/Spark Developer
Responsibilities:
- Worked with lambda architecture in handling and processing batch and real-time d Confidential .
- Using Sqoop, ingested the D Confidential from d Confidential warehouse to HDFS.
- Using Kafka, collected real-time streaming and log d Confidential from web applications and click stream d Confidential, analyzing a part of d Confidential using spark streaming and rest stored into HDFS for future use.
- Worked in writing Hive Queries for analyzing d Confidential in Hive warehouse using Hive Query Language (HiveQL) and Worked with Hive Tables, Hive queries, Partitioning, Bucketing.
- PerformedD Confidential Profiling, identifyd Confidential quality and validating rules regarding d Confidential integrity andd Confidential quality as it relates to the impact on business requirements.
- Build spark applications using SBT builds.
- Used Spark SQL to process the huge amount of structured d Confidential .
- Connected Tableau server to publish dashboard to a central location for portal integration.
- Creation of metrics, attributes, filters, reports, and dashboards created advanced chart types, visualizations and complex calculations to manipulate the d Confidential .
Environment: Cloudera Manager, Sqoop, Java (jdk1.8 Version), Hive, Spark, Spark-SQL, Scala, Tableau.
Confidential
Hadoop/Spark Developer
Responsibilities:
- Worked in Ingesting flat files from local Unix file systems to HDFS and using Sqoop ingested structured d Confidential from legacy RDBMS systems to
- Developed the code for Importing and exporting d Confidential into HDFS and Hive using Sqoop
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, D Confidential Frame, Pair RDD's, Spark YARN.
- Used D Confidential Frame API in Scala for converting the distributed collection of d Confidential organized into named columns, developing predictive analytic using Apache Spark Scala APIs.
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop and Worked with Spark and Scala.
- Writing Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive Used Hive to analyze the partitioned and bucketed d Confidential and compute various metrics for reporting on the dashboard.
- Utilized Oozie workflow to run Hive Jobs Extracted files through Sqoop and placed in HDFS and processed.
Environment: Hadoop, Spark, HDFS, Scala, Hive, Java, Spring, Map Reduce, Sqoop, Spring MVC, Big D Confidential, Spark SQL, JDBC, Oozie, Pig, Flume
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing d Confidential using different big d Confidential analytic tools including Pig, Hive and MapReduce.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise d Confidential .
- Implemented Partitioning, Dynamic Partitions, and Buckets in Hive on Avro files to meet the business requirements.
- Implemented D Confidential Integrity and D Confidential Quality checks using Linux scripts.
- Used flume to tail the application log files into HDFS.
- Involved in scheduling of Hive and pig jobs using Oozie workflow.
- Involved in performance tuning and memory optimization of map-reduce and Hive applications.
- Worked on end to end automation of application.
- Responsible for continuous Build/Integration with Jenkins and deployment using XL Deploy.
- Actively involved in code review and bug fixes and enhancements.
Environment: Hadoop, HDFS, MySQL, Apache Hive, Pig, MapReduce, MySQL, Core Java, Shell Scripting, Eclipse, Git, Jenkins.
