We provide IT Staff Augmentation Services!

Big Data Developer Resume

2.00/5 (Submit Your Rating)

NY

OBJECTIVE:

  • Qualified professional and Certified Hortonworks Certified Developed (HDPCD) wif overall 6+ years of experience seeking a Big Data Hadoop Developer job in open source related technologies such as Hadoop and Spark, where me may be able to use well - honed skills from Hadoop related technologies in order to halp users to develop their applications by means of teh most modern form of development and methodologies.

SUMMARY

  • Good knowledge on Open source Apache Hadoop, HDFS File system, YARN, Sqoop, Hive, Spark, Python, core Java using Hortonworks distributed platform (HDP 2.X).
  • Very good idea about developing data pipelines using distributed technologies wif teh halp of tools such as Sqoop, Hive and Spark.
  • Experience in working wif BI teams and transform big data requirements into Hadoop centric technologies.
  • Hands on experience on Hadoop ecosystem - HDFS, Map Reduce, Hive, Spark, Kafka and Hbase/Phoenix.
  • Knowledge in development of end-to-end data ingestion pipelines dat connects various data sources outside Hadoop environment such as RDBMS databases, Log files etc into teh Hadoop data lake.
  • Good knowledge in Linux Operating systems of various distros including RHEL (4, 5, 6) and Centos.
  • Worked on different file formats like ORC, JSON, and Avro.
  • Proficient in Apache Spark and SparkSQL.
  • Expertise using Spark Framework for batch and real time data processing.
  • Good working knowledge wif various RDBMS databases such as Oracle 10g, 11i, 12, SQL Server, DB2, MySQL etc.
  • Excellent SQL development knowledge including Data Definition language (DDL), Data Query language and Data Manipulation language (DMLs)
  • Good knowledge on Database design, to ensure stability, reliability and performance.
  • Expertise in drafting complex queries, which includes various kinds of joins to address business needs.
  • Expertise in data normalization and de-normalizations techniques.
  • Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Hands on experience in implementing Sequence files, Combiners, Counters, Dynamic Partitions and Bucketing for best practice and performance improvement.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, HadoopGEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Well versed wif security concepts such as Kerberos and Ranger autantication.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Spark Streaming.
  • Strong Analytical and inter personnel skills wif ability to work as a team & individually and a can-do attitude wif teh ability to grasp and understand new technology concepts.
  • Technical expertise in UNIX/LINUX Shell Commands
  • Expert in UNIX shell scripting - ksh, bash, sh and excellent knowledge of Python scripting.
  • Experienced wif patch and package administration including building of packages and RPMs.
  • Experienced in diagnosing and resolving TCP/IP connection problems.
  • Experienced in making Disaster Recovery (DR) and Disaster Contingency Plans.
  • Excellent written and verbal communication skills.
  • Ability to operate effectively on a 24x7 basis in crisis situations.
  • Effective team player wif excellent logical and analytical abilities.
  • In-depth knowledge of computer applications and scripting like Shell, Python and XML.
  • Used CRON, AutoSys for enterprise job scheduling.
  • Proficient in working wif Java, bash and Python scripts.
  • Expertise in Querying RDBMS such as Oracle, MYSQL and SQL Server.
  • Expertise in data migration from various databases to Hadoop HDFS and Hive using Scoop.
  • Expertise in building teh data pipeline using Python.
  • TEMPHas In-depth knowledge of Hadoop Ecosystem and MapReduce Framework.
  • Worked wif Hive’s data warehousing infrastructure to analyze large structured datasets.
  • Skilled knowledge in teh creation of manages tables and external tables in Hive Ecosystem.
  • Extensive working knowledge in designing and developing business process using HIVE, Sqoop, HBase, Spark.
  • Expertise using Spark Framework for batch and real time data processing.
  • Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
  • Good understanding of teh principals and best practices of Software Configuration Management (SCM) and Closely worked wif development, QA and other teams to ensure automated test efforts are tightly integrated wif teh build system and in fixing teh error while doing teh deployment and building.
  • Extensive experience in all phases of Software Development Life Cycle wif emphasis in Designing, Developing, Implementation, Deployment and Support of distributed enterprise scalable, secure and transactional J2EE applications.
  • Experience in UNIX shell scripting, FTP, SFTP and file management in various UNIX environments.
  • Excellent communication and inter-personal skills. Detail oriented, Self-motivated, quick learner, responsible, analytical, time bound team player wif ability to co-ordinate in a team environment.

TECHNICAL SKILLS

Open Source technologies: Apache Hadoop, HDFS, YARN, Capacity Scheduler, Hive, Spark(PySpark), Sqoop, Hbase, Kafka, Zookeeper

Operating Systems: UNIX, Linux, MAC OS, Windows

Version Control tools: GIT, Github, Gitlab

Scripting Languages: Python, Shell, bash, XML, JSON, YAML

Build Tools: MAVEN, ANT

Web Technologies: JDBC, XML, HTML

BUG Tracking tools: JIRA

Programming Langiages: Python, C, Core Java

Scripting: Shell scripting, Python, SQL

Databases: SQL, MySQL, Oracle, SQL Server, DB2

PROFESSIONAL EXPERIENCE:

Confidential, NY

Big Data Developer

Responsibilities:

  • Used Hortonworks Data Platform (HDP 2.5.0, HDP 2.6.1) components to build various data pipelines.
  • Used Ingestion tools such as HDFS put commands to ingest batch files and log files, Sqoop for ingesting data from RDBM systems such as Oracle, SQL Server and MySQL into teh data lake, Spark streaming for ingesting real time data from Kafka topics.
  • Used transformation tools such as Apache Hive/Tez and Apache Spark (PySpark) for transforming various data sources such as JSON, CSV files, Pipe delimited files into HDFS datasets, Hive and Phoenix tables.
  • Worked on spark/pyspark programming to create UDFs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames in Python.
  • Transferred purchase transaction details from legacy systems to HDFS.
  • Wrote Python scripts to make spark-streaming work wif Kafka as part of spark Kafka integration efforts.
  • Used Sqoop to transfer data between relational databases and Hadoop.
  • Worked on HDFS to store and access huge datasets wifin Hadoop
  • Good hands on experience wif git and GitHub.
  • Built on-premise data pipelines using kafka and spark for real time data analysis.
  • Tuned various Hive and Spark jobs for better performance wif techniques such as partitioning, bucketing, and repartitioning.
  • Created many HQL schemas and utilized them throughout teh program wherever required
  • Analyzed existing code and made teh bug fixes wherever required
  • Evaluated various file formats for better performance.
  • Developed various Hive scripts, PySpark scripts and shell scripts to deploy Hadoop applications in production.
  • Converted unstructured data to structured data by writing Spark code.
  • Participated in building scalable distributed data solutions using Hadoop.
  • Performed data import from several data sources, transformations using Hive, and Spark SQL(Pyspark).
  • Performed various data validations, data cleansing and data aggregation using series of Spark Applications.

Environment: RHEL 6.7, Hortonworks Data Platform(2.5.0 and 2.6.1), Apache Hadoop 2.7.1, Apache YARN 2.7.1, Apache Hive 1.2.1, Apache Spark 1.6.X and 2.2.X, Phoenix 4.7, Hbase 1.1, Kafka 0.10, Kerberos, Ranger

Confidential, PA

Big Data Developer

Responsibilities:

  • Used Hortonworks Data Platform (HDP 2.2 and HDP 2.3) components to analyze and build data models.
  • Used various Stinger initiatives such as Tez, ORC and Vectorization in Hive 0.13 to develop optimized data pipelines.
  • Worked wif data science teams to build datasets dat will suit for building models in R and SAS.
  • Worked in profiling various Hive tables for data quality.
  • Leveraged various file formats for better query performance.
  • Bench marked various compression formats to measure query read and write performances.
  • Experience in managing and reviewing Hadoop Log files.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
  • Involved in creating Hive tables, loading wif data and writing hive queries, which runs internally in MapReduce/Tez.
  • Loaded and transformed large sets of structured, semi structured, and unstructured data wif MapReduce, Hive and pig.
  • Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
  • Imported data into HDFS and performed data extract from Oracle and Flat files into HDFS wif Sqoop.
  • Exported teh analyzed data to relational databases using Sqoop for visualization and generated reports for BI team.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Analyzed large data sets to determine optimal way for aggregation and reported on teh same.
  • Implemented various Hive scripts for further Analysis and calculated various metrics used for downstream reporting.

Environment: RHEL 6, Centos 6, Hortonworks Data Platform (2.2 and 2.3), Apache Hadoop 2.6, Apache YARN 2.6, Apache Hive 0.13, Apache Spark 1.6.X, Hbase

Confidential, IL

Java Developer

Responsibilities:

  • Involved in designing teh new Riders and products.
  • Coding key modules such as calculation Engine.
  • Review Code, Unit Test Case and System Integration test cases.
  • Proposed Sales Illustration Calculation Engine rewrite.
  • Proposed and implemented teh automation of teh manuals work via SI Admin Web UI.
  • This reduces teh manual code development, deployment and Testing and Release Phases by 30 %.

Environment: Java, JSP, Servlets, Struts, SOAP, CSS, Oracle, Weblogic9.0,MyEclipse

We'd love your feedback!