We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Kansas City, KS

SUMMARY

  • Over 7+ years of IT experience in Analysis, Design, Implementation, Development, Maintenance, and test large scale applications using SQL, Hadoop, Java and Splunk, Elastic Search, Kibana, logstash other Big Data technologies.
  • Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned.
  • Hands on experience inVPN Putty and WinSCP.
  • Experience in Data load management, importing & exporting data using SQOOP & FLUME.
  • Having good knowledge in writing MapReduce jobs through Pig, Hive, and Sqoop.
  • Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirements using Hive and worked on HiveQL queries for required data extraction, join operations, writing custom UDF's as required and having good experience in optimizing Hive Queries.
  • Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats. Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc.
  • Hands-on Experience in NoSQL databases like Mongo DB, HBase and Cassandra.
  • Working experience on NoSQL databases like HBase, Azure,MongoDB and Cassandra with functionality and implementation.
  • Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Experience in developing and automating application’s usingUnix Shell Scriptingin the field ofBig Data using Map-Reduce Programmingfor batch processing of jobs on aHDFS cluster,HiveandPig
  • Experience in various technologies like Talend, Big Data, Pentaho, Informatica, Amazon redshift, S3 cloud, EC 2, Tableau, Business Objects with different data bases like Oracle, DB2, Vertica, MySQL, Redshift etc.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Hands on experiences in Hadoop, Eco - system components like HDFS, MapReduce, Cloudera, (MRV1, YARN), Pig, Hive, HBase, Sqoop, Flume, Kafka, Impala, Oozie and Programming in Spark using Python and Scala
  • Design and build scalable Hadoop distributed data solutions using native, Cloudera and Hortonworks, Spark, and Hive.
  • Skilled in phases of data processing (collecting, aggregating, moving from various sources) using Apache Flume and Kafka.
  • Skilled in Serverless Technologies likeAWS Elastic Beanstalk, API Gateway, Lambda.
  • Experienced in Amazon Web Services (AWS), and cloud services such as EMR, EC2, S3, EBS and IAM entities, roles, and users.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling forOLTPand dimension modeling forOLAP
  • Experience in Performance Tuning in Vertica which includes creation of projection, partition swapping.
  • Experience onBI reportingwith at ScaleOLAPforBig Data.
  • Experienced in Ansible, Jenkins, and PySpark.
  • Designed, configured and deployed Amazon Web Services (AWS) for a multitude of applications utilizing theAWSstack (Including EC2, Route53, S3, RDS, CloudFormation, Cloud Watch, SQS, IAM), focusing on high-availability, fault tolerance, and auto-scalingSet-up databases in AWS usingRDS, storage usingS3bucket and configuring instance backups to S3 bucket.
  • Working knowledge ofSpark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
  • Hands-on knowledge in Core Java concepts like Exceptions, Collections, Data-structures, I/O, Multithreading, Serialization and deserialization of streaming applications

TECHNICAL SKILLS

BigData/Hadoop Technologies: MapReduce, Spark, SparkSQL,Azure,Spark Streaming, Kafka,PySpark,, Pig, Hive,HBase, Flume, Yarn, Oozie, Zookeeper, Hue, Ambari Server

Languages: HTML5,DHTML, WSDL, CSS3, C, C++, XML,R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD, Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras), Java Script, Shell Scripting

NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB

Development Tools: Microsoft SQL Studio, IntelliJ,Azure Databricks, Eclipse, NetBeans.

Public Cloud: EC2, IAM, S3, Autoscaling, CloudWatch, Route53, EMR, RedShift

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools: Jenkins, Toad, SQL Loader,PostgreSql, Talend,Maven, ANT, RTC, RSA, Control-M, Oziee, Hue, SOAP UI

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, cognos.

Databases: Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, Oracle 11g, 12c, DB2, Teradata, Netezza

Operating Systems: All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential, Kansas City, KS

Responsibilities:

  • Develop New Spark-SQL ETL logics in Big Data for the migration and availability of the Facts and Dimensions used for the Analytics.
  • Indexed logs from data lake on Elastic search using spark for visualization on Kibana.
  • Installed and configured elastic search and managed the system for data ingestion
  • ShippedHDFSIndexed documents toElastic searchand writtenScala scriptsfor Querying and ingesting Dataframes in bulk transport using embeddedElastic4s(Scala) module for Crud.
  • Develop of PySpark SQL application, Big Data Migration from Teradata to Hadoop and reduce Memory utilization in Teradata analytics.
  • Develop Pig scripts to establish the data flow to achieve the desired watch list at store and item level exception reporting.
  • Develop shell script to fetch the store, max timestamp, date combinations for Hive tables to pass them as parameters to pig script and establish connection to MySQL database.
  • Designed, configured and deployed Amazon Web Services (AWS) for applications utilizing the AWS stack (Including EC2, Route53, S3, RDS, Cloud Formation, Cloud Watch, SQS, IAM), focusing on high-availability, fault tolerance, auto-scaling, load-balancing capacity monitoring and alerting.
  • Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Involved in creating Hive Tables, loading with data and writing Hive queries to do analytics on the data
  • Developed and implemented real-time data pipelines with Spark Streaming, Kafka, and Cassandra to replace existing lambda architecture without losing the fault-tolerant capabilities of the existing architecture.
  • Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources and developed Spark Applications by using Scala, Java.
  • Created a Spark Streaming application to consume real-time data from Kafka sources and applied real-time data analysis models that we can update on new data in the stream as it arrives.
  • Used Spark-Structured-Streaming to perform necessary transformations and data models which gets the data from Kafka in real time and Persists into Cassandra.
  • Written transformations and actions on Data Frames, used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Involved in configuringElastic search, Log stash & Kibana (ELK)stacks andElasticsearch performance and optimization
  • Converted features in JSON to Elasticsearch Stack: Logstash to Kibana
  • Configured flume to log file movement from servers to elastic search and analyze the data using Kibana
  • Architected solutions on AWS Cloud platform using various services offered by Amazon like EC2, ELB, Auto Scaling, EBS, S3, VPC, RDS, SNS, VPN, CloudWatch & IAM.
  • Strong knowledge in NOSQL column oriented databases like HBase and their integration with Hadoop cluster using connectors
  • AWS Services used APIGateway, Lambda, EMR, Kinesis, IAM, EC2, S3, EBS, Data Pipeline, VPC, Glacier & Redshift
  • Integrated Hadoop with Tableau to generate visualizations like Tableau Dashboards.

Environment: Cloudera, Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, EMR, RDS, Linux Shell Scripting, Postgres, MySQL, Big query, Cloud Storage, Cloud- ML,Data-Proc, Data-Lab, IAM, Cloud SQL,IAM, Eclipse, Java/J2EE, Oracle, HTML, PL/SQL, Oracle, XML, SQL

Hadoop/Spark Developer

Confidential, Southlake, TX

Responsibilities:

  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming.
  • Analysed data which need to be loaded into hadoop and contacted with respective source teams to get the table information and connection details.
  • Performing aggregations on large amounts of data using Apache Spark, Scala and landing data in Hive warehouse for further analysis.
  • Loadandtransformlarge sets of structured, semistructuredandunstructureddata.
  • Involved inloading datafromUNIXfile system toHDFS.
  • Created Managed and External Hive tables with static/dynamic partitioning.
  • Written Hive queries for data analysis to meet the Business requirements.
  • Increased performance of the HiveQLs by splitting larger queries into small and by introducing temporary tables in between them.
  • Converted some existing sqoop, hive jobs to SparkSQL applications to read data from Oracle using JDBC and write it to hive tables.
  • Developed shell scripts for removal of orphan partitions for hive tables, and archive retention in HDFS.
  • Imported data usingSqoopto load data fromMySQLto HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • WrittenHive queriesfor data analysis to meet the business requirements.
  • CreatingHive tablesand working on them usingHive QL.
  • Importing and exporting data intoHDFSandHiveusingSqoop.
  • Experienced in defining job flows.
  • Got good experience withNOSQLdatabaseSOLRHBase.
  • Involved in creatingHivetables loading with data and writing hive queries which will run internally in map reduce way.
  • Writing data from Spark to AWS S3 buckets.
  • Creating Spark core, Spark SQL applications using Scala/Python to process, transform and enrich data for TMT and Usage applications.
  • Creating Apache NiFi application to receive real time data from webserver for TSO application.
  • Exporting data from hadoop to AWS Redshift and Teradata.
  • Developed a custom FileSystem plug in for Hadoop so it can access files on Data Platform.
  • This plugin allowsHadoopMapReduce programs HBase Pigand Hive to work unmodified and access files directly.
  • Implemented Data Quality framework using AWS Athena, Snowflake, Airflow and Python.
  • Experience withETLworkflow Management tools likeApache Airflowand have significant experience in writing thepythonscripts to implement the workflow.
  • Designed and implementedETLpipelines between from various Relational data Bases to the Data Warehouse usingApache Airflow.
  • DevelopedPythonscripts to automate theETLprocess using Apache Airflow andCRONscripts in theUnixoperating system as well.
  • Testing of Spark, Big Query, Airflow DAGs and Kafka streaming.
  • UsingImpalafor query processing.
  • Extensive experience onUnit testing bycreating Test Cases.
  • UsingKafka, Spark Streamingfor streaming purpose.
  • Experience in Development Methodologies likeAgile, Waterfall.
  • Experience in code repositories likeGithub.

Environment: Hadoop, Map Reduce, HDFS, Hive, Python, Python SQL, PySpark, EMR, AirFlow, Scala, Spark, Spark SQL, Dynamo DB, Redshift, SQL, MapR, AWS S3, Elastic Search, EMR, GIT, JIRA, Unix/Linux, Agile Methodology, Scrum, Bitbucket, Agile methodology, Shell scripts, Hadoop daemon.

Software Developer

Confidential

Responsibilities:

  • As a Software Developer involved inback-endandfront-enddeveloping team.
  • Involved in theSoftware Development Life Cycle (SDLC)including Analysis, Design, Implementation
  • Developed REST Web Servicesclients to consume thoseWeb Servicesas well other enterprise wideWeb Services.
  • Implementation of Spring Restful web serviceswhich producesJSON.
  • Responsible for system analysis, design and development usingJ2EE architecture.
  • Actively participated in requirementsgathering, analysis, designandtesting phases.
  • Responsible for usecase diagrams, class diagrams and sequence diagramsusingRational Rosein the Design phase.
  • Implemented application using MVC architecture integrating Hibernate andSpring frameworks.
  • Designed client application usingJava Server Pages (JSPpandad), Cascading Style Sheets (CSS)andXML.
  • Implemented the Enterprise JavaBeansto handle various transactions.
  • Worked on Linux environment and extensively configured inLinux.
  • Developed Web Services to transfer data between client and server vice versa usingRest,SOAP,WSDL and UDDI.
  • Used JavaFinancial platform built an application, which is an integration of technologies such asStrutsandWeb Flow.
  • Designed the application by implementing Struts based onMVC Architecture, simple Java Beansas aModel, JSP UIComponents as View andAction Servletsas aController.
  • Developed MVC designpattern-based User Interface usingJSP, XML, HTMLandStruts.
  • Developed custom validations and consumedStruts Validators frameworkvalidations to validate user input.
  • Used the JDBC for data retrieval from the database for various inquiries.
  • Performed application design development maintenance enhancements and testing usingJUnit framework.
  • Used J2EE patterns such asController, Singleton, factory, MVC architectureis used in this application
  • Implemented Spring Framework IOC (Inversion of Control)design pattern for relationship between application components.
  • Used Hibernate for mapping claim data by connecting toOracle database.
  • Designed and developed theREST based Micro servicesusing theSpring Boot, Spring DatawithJPA.
  • UsedHibernateextensively to have Database access mechanism withHQL (Hibernate query language) queries

Environment: JAVA, J2EE, MVC, Spring framework, JSP, CSS, XML, JavaBeans, Linux, Web Services, SOAP, Struts, HTML, SQL, JDBC, JUnit, Oracle, Hibernate

We'd love your feedback!